Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU accelerated encoder #895

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open

GPU accelerated encoder #895

wants to merge 30 commits into from

Conversation

dmanc
Copy link
Contributor

@dmanc dmanc commented Nov 14, 2024

Why are these changes needed?

This PR adds support for multiple backends for the encoder. Initially we only have the default and icicle backends, see https://dev.ingonyama.com/icicle/overview for more information on the icicle backend.

A backend may include GPU acceleration as an option, see https://dev.ingonyama.com/icicle/install_cuda_backend for icicle's GPU accelerated backend.

In order to use the icicle backend it must be compiled with the icicle build tag:

go build -tags=icicle main.go

This PR also adds a refactor for passing configuration to the encoding library, e.g.

opts := []prover.ProverOption{
	prover.WithKZGConfig(&config.EncoderConfig),
	prover.WithLoadG2Points(false),
	prover.WithBackend(backendType),
	prover.WithGPU(config.ServerConfig.EnableGPU),
	prover.WithRSEncoder(rsEncoder),
}
prover, err := prover.NewProver(opts...)

Checks

  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

Copy link
Contributor

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes lost all my comments while reviewing on github.dev.... friggin github. Some graphql internal error or something. So I'll just submit this first set of reviews in case. Don't really feel like reviewing again lol... another reason why small PRs are gud, big PRs bad.

api/clients/retrieval_client_test.go Outdated Show resolved Hide resolved
core/test/core_test.go Outdated Show resolved Hide resolved
disperser/cmd/encoder/flags/flags.go Outdated Show resolved Hide resolved
disperser/cmd/encoder/config.go Outdated Show resolved Hide resolved
Copy link
Contributor

@samlaf samlaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just spent 2 hours reviewing and I barely even got to the important stuff. Suggest no more PRs like this, for the sake of our industry, and of your reviewers' mental health. :D

docker-bake.hcl Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a lint config setup for hcl files? We should if not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what tool can lint it I just gave it to Claude.

Comment on lines +10 to +11
benchmark_icicle:
go run -tags=icicle main.go -cpuprofile cpu.prof -memprofile mem.prof
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, is there any reason you're not using go benchmarks https://pkg.go.dev/testing#hdr-Benchmarks? Is there some feature that didn't work there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just am not sure how to use it. Also the benchmarking code here was used to compare the default CPU implementation vs icicle GPU implementation in terms of speed-up so needed the raw timings.

Comment on lines 17 to 23
type IcicleDevice struct {
Device icicle_runtime.Device
NttCfg core.NTTConfig[[icicle_bn254.SCALAR_LIMBS]uint32]
MsmCfg core.MSMConfig
FlatFFTPointsT []icicle_bn254.Affine
SRSG1Icicle []icicle_bn254.Affine
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the point that we can have a different ntt/msm config and points loaded on different devices (gpus?)? Do we actually make use of this flexibility? If so, please add comment explaining.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there's only going to be a single configuration loaded at startup time. Right now we don't support multiple GPUs for an encoder.

type IcicleDeviceConfig struct {
EnableGPU bool
NTTSize uint8
// MSM setup parameters (optional)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are they optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reed-solomon encoder only uses NTT so doesn't need the MSM configuration

Comment on lines 58 to 65
setupErr = fmt.Errorf("could not setup NTT")
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just return the setupErr directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also prob want to wrap the icicleErr to help debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cannot return because I don't think that RunOnDevice func will let me. Looks like this:

func RunOnDevice(device *Device, funcToRun func(args ...any), args ...any) {
    go func(deviceToRunOn *Device) {
        defer runtime.UnlockOSThread()
        runtime.LockOSThread()
        originalDevice, _ := GetActiveDevice()
        SetDevice(deviceToRunOn)
        funcToRun(args...)
        SetDevice(originalDevice)
    }(device)
}

encoding/icicle/device_setup.go Outdated Show resolved Hide resolved
"github.com/ingonyama-zk/icicle/v3/wrappers/golang/runtime"
)

func SetupMsm(rowsG1 [][]bn254.G1Affine, srsG1 []bn254.G1Affine) ([]icicle_bn254.Affine, []icicle_bn254.Affine, core.MSMConfig, core.MSMConfig, runtime.EIcicleError) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment explaining what this is doing. Why are we returning 2 MSMConfig for G1 and G2 if they are the same for eg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G2 MSM is not used right now but if we wanted to accelerate the length proof commitment then we would need it. Going to remove it from the configuration for now.

encoding/icicle/ntt_setup.go Outdated Show resolved Hide resolved
Copy link
Contributor

@jianoaix jianoaix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a diff-neutral PR just for refactoring? It seems there are code here which doesn't need deep review as they are just existing code moved around?

encoding/rs/serialization.go Outdated Show resolved Hide resolved

func CreateIcicleBackendEncoder(p *Encoder, params encoding.EncodingParams, fs *fft.FFTSettings) (*ParametrizedEncoder, error) {
// Not supported
return nil, fmt.Errorf("icicle backend called without icicle build tag")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May use errors.New("...") since this has no formatting needs

encoding/rs/serialization.go Outdated Show resolved Hide resolved
@dmanc
Copy link
Contributor Author

dmanc commented Nov 15, 2024

Can we have a diff-neutral PR just for refactoring? It seems there are code here which doesn't need deep review as they are just existing code moved around?

Yes, if we're happy with this new configuration method then I'll make another PR with that and rebase this one

RUN go build -tags=icicle -o ./bin/server ./cmd/encoder

# Start a new stage for the base image
FROM nvidia/cuda:12.2.2-base-ubuntu22.04
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image size comparison:

ghcr.io/layr-labs/eigenda/encoder latest 781c3866c5dd 6 hours ago 41MB
6129846e8150 6 hours ago 41MB
ghcr.io/layr-labs/eigenda/encoder-icicle latest f530fe9c250d 21 hours ago 760MB

prover.WithGPU(config.ServerConfig.GPUEnable),
prover.WithRSEncoder(rsEncoder),
}
prover, err := prover.NewProver(popts...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to follow options pattern with this change https://golang.cafe/blog/golang-functional-options-pattern.html


func CreateIcicleBackendProver(p *Prover, params encoding.EncodingParams, fs *fft.FFTSettings, ks *kzg.KZGSettings) (*ParametrizedProver, error) {
// Not supported
return nil, fmt.Errorf("icicle backend called without icicle build tag")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of noicicle.go file is that the icicle backend should only be functional when the icicle build tag is used. This is so we don't need to include the icicle backend files in the traditional encoder backend, otherwise it may break the existing build.

}

// CommitmentDevice represents a backend capable of computing various KZG commitments.
type KzgCommitmentsBackend interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved commitments to its own backend in case we would like to accelerate them in the future.

@@ -331,3 +457,99 @@ func toUint64Array(chunkIndices []encoding.ChunkNumber) []uint64 {
}
return res
}

func (p *Prover) newProver(params encoding.EncodingParams) (*ParametrizedProver, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should be reviewed carefully

)

// VerifierOption defines a function that configures a Verifier
type VerifierOption func(*Verifier) error
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go in it's own PR, it's applying the same options pattern refactor to verifier so not super relevant to GPU encoder

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks more verbose and is deviating from the convention, what're the main problems it solves here compared to passing in a config struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One argument I would give for this configuration pattern is that it provides a simpler constructor that is extensible and backwards compatible.

For example, with a config struct we would have to modify every client that calls the constructor when we add a new configuration value vs. this pattern you add a WithX function in the library that sets the field and does whatever validation it needs to. Then clients can optionally update to use that new added configuration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with a config struct we would have to modify every client that calls the constructor when we add a new configuration value

Not sure about this, if that new field is optional, the caller doesn't need to do anything. If it's required, callers will have to make changes in either case.

My concerns are the boilplate and the scalabilty of keeping adding WithXX, we may have a long list of fields (like in some big structs).
Also it looks it still need to operate on config struct, if the validation runs across multiple fields (eg. invariants defined over multiple fields).

Copy link
Contributor Author

@dmanc dmanc Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concerns are the boilplate and the scalabilty of keeping adding WithXX, we may have a long list of fields (like in some big structs).

These seem like tractable problems though, here is an example in grpc library:
https://github.com/grpc/grpc-go/blob/66385b28b3fe71a4895f00d581ede0a344743a3f/dialoptions.go#L248

"github.com/consensys/gnark-crypto/ecc/bn254/fr"
)

type ParametrizedEncoder struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor to make the encoder parametrized based on FFT settings (similar to Prover). Don't think it's too valuable since max time is ~600ms for loading largest fft setting. Am thinking of reverting

@@ -0,0 +1,103 @@
//go:build icicle

package icicle
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This folder is icicle backend related setup

@@ -0,0 +1,46 @@
//go:build icicle

package icicle
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This folder is icicle related computations for MultiProofs

"github.com/ingonyama-zk/icicle/v3/wrappers/golang/runtime"
)

type RsIcicleComputeDevice struct {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is icicle related computations for RS Encode

@dmanc
Copy link
Contributor Author

dmanc commented Nov 15, 2024

One thing I'm not too sure how to deal with is testing Icicle/ GPU encoder correctness. The only strategy I can think of is to compare output of default backend and icicle backend.

Also if we wanted it to be part of CI, then we need to acquire GPU based github action runners.

encoding/rs/icicle.go Outdated Show resolved Hide resolved

func CreateIcicleBackendEncoder(e *Encoder, params encoding.EncodingParams, fs *fft.FFTSettings) (*ParametrizedEncoder, error) {
icicleDevice, err := icicle.NewIcicleDevice(icicle.IcicleDeviceConfig{
GPUEnable: e.Config.GPUEnable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks Icicle will support also CPU now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but it's not as performant as the gnark backend, am trying to look into why


var evals []fr.Element
g.NttCfg.BatchSize = int32(1)
runtime.RunOnDevice(&g.Device, func(args ...any) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the GPU is enabled, then this func will be executed on GPU by the runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes because the device variable will be set to the GPU device here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance, it looks the main data movement to/from GPU will be the coefficients and the resulting evaluations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For RSEncode yes:

For multiproofs it is also the same (move coefficients, move out proofs) but in addition we need to also transfer SRS Table to compute the MSM. This can be done at SetupMSM at initialization but not sure how much speedup that is.

)

// VerifierOption defines a function that configures a Verifier
type VerifierOption func(*Verifier) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks more verbose and is deviating from the convention, what're the main problems it solves here compared to passing in a config struct?

Fs: fs,
verbose: verbose,
NumRSWorker: runtime.GOMAXPROCS(0),
return &ParametrizedEncoder{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when you say "backend", it looks a software/library that computes RS encoding? I.e. it decoupled device (CPU/GPU) and backend (the software that executes compute on device)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now I consider backend as library that implements the cryptographic primitive like NTT, MSM. We have our own implementation of NTT borrowed from protolambda and MSM from gnark.

It could be plausible that there exists libraries that implement the higher level primitives like multiproofs, commitment, RS encode and we can try to make our code more of a frontend that calls those. But for now we focus on the core primitives that give the speedup factor.

encoding/rs/extend_poly.go Show resolved Hide resolved
@dmanc dmanc marked this pull request as ready for review November 16, 2024 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants