GPU accelerated encoder #895

dmanc · 2024-11-14T08:30:57Z

Why are these changes needed?

This PR adds support for multiple backends for the encoder. Initially we only have the default and icicle backends, see https://dev.ingonyama.com/icicle/overview for more information on the icicle backend.

A backend may include GPU acceleration as an option, see https://dev.ingonyama.com/icicle/install_cuda_backend for icicle's GPU accelerated backend.

In order to use the icicle backend it must be compiled with the icicle build tag:

go build -tags=icicle main.go

This PR also adds a refactor for passing configuration to the encoding library, e.g.

opts := []prover.ProverOption{
	prover.WithKZGConfig(&config.EncoderConfig),
	prover.WithLoadG2Points(false),
	prover.WithBackend(backendType),
	prover.WithGPU(config.ServerConfig.EnableGPU),
	prover.WithRSEncoder(rsEncoder),
}
prover, err := prover.NewProver(opts...)

Checks

I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
I've checked the new test coverage and the coverage percentage didn't drop.
Testing Strategy
- Unit tests
- Integration tests
- This PR is not tested :(

samlaf

Yikes lost all my comments while reviewing on github.dev.... friggin github. Some graphql internal error or something. So I'll just submit this first set of reviews in case. Don't really feel like reviewing again lol... another reason why small PRs are gud, big PRs bad.

api/clients/retrieval_client_test.go

core/test/core_test.go

disperser/cmd/encoder/flags/flags.go

disperser/cmd/encoder/config.go

samlaf

Just spent 2 hours reviewing and I barely even got to the important stuff. Suggest no more PRs like this, for the sake of our industry, and of your reviewers' mental health. :D

samlaf · 2024-11-14T11:44:33Z

docker-bake.hcl

Do we have a lint config setup for hcl files? We should if not.

I'm not sure what tool can lint it I just gave it to Claude.

samlaf · 2024-11-14T11:45:15Z

encoding/bench/Makefile

+benchmark_icicle:
+	go run -tags=icicle main.go -cpuprofile cpu.prof -memprofile mem.prof


Curious, is there any reason you're not using go benchmarks https://pkg.go.dev/testing#hdr-Benchmarks? Is there some feature that didn't work there?

I just am not sure how to use it. Also the benchmarking code here was used to compare the default CPU implementation vs icicle GPU implementation in terms of speed-up so needed the raw timings.

samlaf · 2024-11-14T11:47:01Z

encoding/icicle/device_setup.go

+type IcicleDevice struct {
+	Device         icicle_runtime.Device
+	NttCfg         core.NTTConfig[[icicle_bn254.SCALAR_LIMBS]uint32]
+	MsmCfg         core.MSMConfig
+	FlatFFTPointsT []icicle_bn254.Affine
+	SRSG1Icicle    []icicle_bn254.Affine
+}


Is the point that we can have a different ntt/msm config and points loaded on different devices (gpus?)? Do we actually make use of this flexibility? If so, please add comment explaining.

No there's only going to be a single configuration loaded at startup time. Right now we don't support multiple GPUs for an encoder.

samlaf · 2024-11-14T11:47:18Z

encoding/icicle/device_setup.go

+type IcicleDeviceConfig struct {
+	EnableGPU bool
+	NTTSize   uint8
+	// MSM setup parameters (optional)


why are they optional?

Reed-solomon encoder only uses NTT so doesn't need the MSM configuration

samlaf · 2024-11-14T11:49:31Z

encoding/icicle/device_setup.go

+			setupErr = fmt.Errorf("could not setup NTT")
+			return


just return the setupErr directly?

also prob want to wrap the icicleErr to help debug

cannot return because I don't think that RunOnDevice func will let me. Looks like this:

func RunOnDevice(device *Device, funcToRun func(args ...any), args ...any) { go func(deviceToRunOn *Device) { defer runtime.UnlockOSThread() runtime.LockOSThread() originalDevice, _ := GetActiveDevice() SetDevice(deviceToRunOn) funcToRun(args...) SetDevice(originalDevice) }(device) }

encoding/icicle/device_setup.go

samlaf · 2024-11-14T11:56:36Z

encoding/icicle/msm_setup.go

+	"github.com/ingonyama-zk/icicle/v3/wrappers/golang/runtime"
+)
+
+func SetupMsm(rowsG1 [][]bn254.G1Affine, srsG1 []bn254.G1Affine) ([]icicle_bn254.Affine, []icicle_bn254.Affine, core.MSMConfig, core.MSMConfig, runtime.EIcicleError) {


Add comment explaining what this is doing. Why are we returning 2 MSMConfig for G1 and G2 if they are the same for eg?

G2 MSM is not used right now but if we wanted to accelerate the length proof commitment then we would need it. Going to remove it from the configuration for now.

encoding/icicle/ntt_setup.go

jianoaix

Can we have a diff-neutral PR just for refactoring? It seems there are code here which doesn't need deep review as they are just existing code moved around?

encoding/rs/serialization.go

jianoaix · 2024-11-15T01:37:49Z

encoding/rs/noicicle.go

+
+func CreateIcicleBackendEncoder(p *Encoder, params encoding.EncodingParams, fs *fft.FFTSettings) (*ParametrizedEncoder, error) {
+	// Not supported
+	return nil, fmt.Errorf("icicle backend called without icicle build tag")


May use errors.New("...") since this has no formatting needs

encoding/rs/serialization.go

dmanc · 2024-11-15T01:45:16Z

Can we have a diff-neutral PR just for refactoring? It seems there are code here which doesn't need deep review as they are just existing code moved around?

Yes, if we're happy with this new configuration method then I'll make another PR with that and rebase this one

dmanc · 2024-11-15T05:15:36Z

disperser/cmd/encoder/icicle.Dockerfile

+RUN go build -tags=icicle -o ./bin/server ./cmd/encoder
+
+# Start a new stage for the base image
+FROM nvidia/cuda:12.2.2-base-ubuntu22.04


Image size comparison:

ghcr.io/layr-labs/eigenda/encoder latest 781c3866c5dd 6 hours ago 41MB
6129846e8150 6 hours ago 41MB
ghcr.io/layr-labs/eigenda/encoder-icicle latest f530fe9c250d 21 hours ago 760MB

dmanc · 2024-11-15T19:30:47Z

disperser/cmd/encoder/main.go

+			prover.WithGPU(config.ServerConfig.GPUEnable),
+			prover.WithRSEncoder(rsEncoder),
+		}
+		prover, err := prover.NewProver(popts...)


Trying to follow options pattern with this change https://golang.cafe/blog/golang-functional-options-pattern.html

dmanc · 2024-11-15T19:34:00Z

encoding/kzg/prover/noicicle.go

+
+func CreateIcicleBackendProver(p *Prover, params encoding.EncodingParams, fs *fft.FFTSettings, ks *kzg.KZGSettings) (*ParametrizedProver, error) {
+	// Not supported
+	return nil, fmt.Errorf("icicle backend called without icicle build tag")


The point of noicicle.go file is that the icicle backend should only be functional when the icicle build tag is used. This is so we don't need to include the icicle backend files in the traditional encoder backend, otherwise it may break the existing build.

dmanc · 2024-11-15T19:34:55Z

encoding/kzg/prover/proof_backend.go

+}
+
+// CommitmentDevice represents a backend capable of computing various KZG commitments.
+type KzgCommitmentsBackend interface {


Moved commitments to its own backend in case we would like to accelerate them in the future.

dmanc · 2024-11-15T19:35:41Z

encoding/kzg/prover/prover.go

@@ -331,3 +457,99 @@ func toUint64Array(chunkIndices []encoding.ChunkNumber) []uint64 {
 	}
 	return res
 }
+
+func (p *Prover) newProver(params encoding.EncodingParams) (*ParametrizedProver, error) {


this file should be reviewed carefully

dmanc · 2024-11-15T19:46:13Z

encoding/kzg/verifier/verifier.go

 )

+// VerifierOption defines a function that configures a Verifier
+type VerifierOption func(*Verifier) error


This can go in it's own PR, it's applying the same options pattern refactor to verifier so not super relevant to GPU encoder

It looks more verbose and is deviating from the convention, what're the main problems it solves here compared to passing in a config struct?

One argument I would give for this configuration pattern is that it provides a simpler constructor that is extensible and backwards compatible.

For example, with a config struct we would have to modify every client that calls the constructor when we add a new configuration value vs. this pattern you add a WithX function in the library that sets the field and does whatever validation it needs to. Then clients can optionally update to use that new added configuration.

with a config struct we would have to modify every client that calls the constructor when we add a new configuration value

Not sure about this, if that new field is optional, the caller doesn't need to do anything. If it's required, callers will have to make changes in either case.

My concerns are the boilplate and the scalabilty of keeping adding WithXX, we may have a long list of fields (like in some big structs).
Also it looks it still need to operate on config struct, if the validation runs across multiple fields (eg. invariants defined over multiple fields).

My concerns are the boilplate and the scalabilty of keeping adding WithXX, we may have a long list of fields (like in some big structs).

These seem like tractable problems though, here is an example in grpc library:
https://github.com/grpc/grpc-go/blob/66385b28b3fe71a4895f00d581ede0a344743a3f/dialoptions.go#L248

dmanc · 2024-11-15T19:49:48Z

encoding/rs/parametrized_encoder.go

+	"github.com/consensys/gnark-crypto/ecc/bn254/fr"
+)
+
+type ParametrizedEncoder struct {


Refactor to make the encoder parametrized based on FFT settings (similar to Prover). Don't think it's too valuable since max time is ~600ms for loading largest fft setting. Am thinking of reverting

dmanc · 2024-11-15T19:51:24Z

encoding/icicle/utils.go

@@ -0,0 +1,103 @@
+//go:build icicle
+
+package icicle


This folder is icicle backend related setup

dmanc · 2024-11-15T19:51:53Z

encoding/kzg/prover/icicle/ecntt.go

@@ -0,0 +1,46 @@
+//go:build icicle
+
+package icicle


This folder is icicle related computations for MultiProofs

dmanc · 2024-11-15T19:52:06Z

encoding/rs/icicle/extend_poly.go

+	"github.com/ingonyama-zk/icicle/v3/wrappers/golang/runtime"
+)
+
+type RsIcicleComputeDevice struct {


This is icicle related computations for RS Encode

dmanc · 2024-11-15T19:54:12Z

One thing I'm not too sure how to deal with is testing Icicle/ GPU encoder correctness. The only strategy I can think of is to compare output of default backend and icicle backend.

Also if we wanted it to be part of CI, then we need to acquire GPU based github action runners.

encoding/rs/icicle.go

jianoaix · 2024-11-15T22:19:42Z

encoding/rs/icicle.go

+
+func CreateIcicleBackendEncoder(e *Encoder, params encoding.EncodingParams, fs *fft.FFTSettings) (*ParametrizedEncoder, error) {
+	icicleDevice, err := icicle.NewIcicleDevice(icicle.IcicleDeviceConfig{
+		GPUEnable: e.Config.GPUEnable,


It looks Icicle will support also CPU now?

Yes but it's not as performant as the gnark backend, am trying to look into why

jianoaix · 2024-11-15T22:23:33Z

encoding/rs/icicle/extend_poly.go

+
+	var evals []fr.Element
+	g.NttCfg.BatchSize = int32(1)
+	runtime.RunOnDevice(&g.Device, func(args ...any) {


If the GPU is enabled, then this func will be executed on GPU by the runtime?

yes because the device variable will be set to the GPU device here

eigenda/encoding/icicle/device_setup.go

Line 88 in dfe247d

Device: device,

For performance, it looks the main data movement to/from GPU will be the coefficients and the resulting evaluations?

For RSEncode yes:

For multiproofs it is also the same (move coefficients, move out proofs) but in addition we need to also transfer SRS Table to compute the MSM. This can be done at SetupMSM at initialization but not sure how much speedup that is.

jianoaix · 2024-11-15T22:24:49Z

encoding/kzg/verifier/verifier.go

 )

+// VerifierOption defines a function that configures a Verifier
+type VerifierOption func(*Verifier) error


It looks more verbose and is deviating from the convention, what're the main problems it solves here compared to passing in a config struct?

jianoaix · 2024-11-15T22:28:29Z

encoding/rs/encoder.go

-		Fs:             fs,
-		verbose:        verbose,
-		NumRSWorker:    runtime.GOMAXPROCS(0),
+	return &ParametrizedEncoder{


So when you say "backend", it looks a software/library that computes RS encoding? I.e. it decoupled device (CPU/GPU) and backend (the software that executes compute on device)

Right now I consider backend as library that implements the cryptographic primitive like NTT, MSM. We have our own implementation of NTT borrowed from protolambda and MSM from gnark.

It could be plausible that there exists libraries that implement the higher level primitives like multiproofs, commitment, RS encode and we can try to make our code more of a frontend that calls those. But for now we focus on the core primitives that give the speedup factor.

encoding/rs/extend_poly.go

dmanc requested review from bxue-l2 and jianoaix November 14, 2024 08:31

samlaf reviewed Nov 14, 2024

View reviewed changes

api/clients/retrieval_client_test.go Outdated Show resolved Hide resolved

core/test/core_test.go Outdated Show resolved Hide resolved

disperser/cmd/encoder/flags/flags.go Outdated Show resolved Hide resolved

disperser/cmd/encoder/config.go Outdated Show resolved Hide resolved

samlaf reviewed Nov 14, 2024

View reviewed changes

jianoaix reviewed Nov 15, 2024

View reviewed changes

dmanc commented Nov 15, 2024

View reviewed changes

jianoaix reviewed Nov 15, 2024

View reviewed changes

dmanc marked this pull request as ready for review November 16, 2024 00:03

dmanc force-pushed the gpu-encoder branch from dfe247d to 4efa8bc Compare November 16, 2024 00:07

dmanc and others added 17 commits November 16, 2024 00:23

GPU encoder init

9465ed6

Update GPU code to icicle v3

fde62e7

update

d2dc572

add logs for debugging

07061e8

Use RunDevice

e3d9396

Dockerfile fixes

0dddc3f

cleanup

65df82b

Cleanup / refactors

1ef70bb

fix rebase issues

710b297

Add RSEncode support to encoder server

686c98f

Fix to add seperate services

cc7e892

Improve time logs on point loading

26b3832

Save state

28ce00d

rebase

3b214fd

refactor to add parametrized rs encoder

7dacec9

save work

ed7ec88

Options patter for verifier

640f4bf

dmanc added 10 commits November 16, 2024 00:23

Save changes

3c2d771

cleanup some

14d5eb7

Pull icicle directly

04e471a

Some reorganization and renaming

9180b66

more cleanup

06d5ba4

nit

ba408fe

fix lint

1a6bd77

Respond to some PR feedback

cc4b6fe

fix rs encode fft settings when numChunks=1

7f5abfe

Fix config

6e5dc51

dmanc force-pushed the gpu-encoder branch from 4efa8bc to 6e5dc51 Compare November 16, 2024 00:25

dmanc added 3 commits November 16, 2024 01:32

apply some pr feedback + move options to its own file + fix test

23061a0

save

c1254c8

Add internal icicle build

086f2f2

		benchmark_icicle:
		go run -tags=icicle main.go -cpuprofile cpu.prof -memprofile mem.prof

GPU accelerated encoder #895

Are you sure you want to change the base?

GPU accelerated encoder #895

Conversation

dmanc commented Nov 14, 2024 • edited Loading

Why are these changes needed?

Checks

samlaf left a comment • edited Loading

Choose a reason for hiding this comment

samlaf left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jianoaix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmanc commented Nov 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmanc Nov 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmanc commented Nov 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmanc commented Nov 14, 2024 •

edited

Loading

samlaf left a comment •

edited

Loading

dmanc Nov 16, 2024 •

edited

Loading