ZK Abstraction Layer - PoC #107

mratsim · 2023-11-29T22:45:05Z

PoC branch of the ZK Abstraction Layer.

Next steps:

propagate the change up to Halo2 and see how intrusive they are.
Implement a non-Rust ZAL backend to show how to glue accelerators with Halo2.

Out-of-scope

We keep the PoC simple and effective so that it can serve as a basis for discussion:

in terms of API
in terms of release management as this require breaking all users API by introducing everywhere a ZAL Engine parameter. I.e. we might want to:
- tag a release with all the improvements made and quick win pending so an "old interface" release, possibly LTS is done (say 6 month transition period).
- tag a release specifically just for this for Halo2 and Halo2curves.

Hence the following are out-of-scope:

MSM witch cached bases
FFTs
coset FFTs
async API

mratsim · 2023-12-04T15:46:34Z

I can confirm that the current ZAL API proposal works with a non-Rust ZAL backend, in this case Constantine

I've created a document for accelerator providers that outline out to integrate halo2curves ZAL with a C backend: https://github.com/mratsim/constantine/blob/master/docs/zk_accel_layer.md

mratsim · 2023-12-05T19:08:13Z

I've added a caching API for discussion with opaque handles to the coefficients and base of the MSM.

halo2curves/src/zal.rs

Lines 42 to 81 in 014cf86

    
           // The ZK Accel Layer API 
        
           // --------------------------------------------------- 
        
           pub trait ZalEngine {} 
        
           pub trait MsmAccel<C: CurveAffine>: ZalEngine { 
        
               fn msm(&self, coeffs: &[C::Scalar], base: &[C]) -> C::Curve; 
        
               // Caching API 
        
               // ------------------------------------------------- 
        
               // From here we propose an extended API 
        
               // that allows reusing coeffs and/or the base points 
        
               // 
        
               // This is inspired by CuDNN API (Nvidia GPU) 
        
               // and oneDNN API (CPU, OpenCL) https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnn-ops-infer-so-opaque 
        
               // usage of descriptors 
        
               // 
        
               // https://github.com/oneapi-src/oneDNN/blob/master/doc/programming_model/basic_concepts.md 
        
               // 
        
               // Descriptors are opaque pointers that hold the input in a format suitable for the accelerator engine. 
        
               // They may be: 
        
               // - Input moved on accelerator device (only once for repeated calls) 
        
               // - Endianess conversion 
        
               // - Converting from Montgomery to Canonical form 
        
               // - Input changed from Projective to Jacobian coordinates or even to a Twisted Edwards curve. 
        
               // - other form of expensive preprocessing 
        
               type CoeffsDescriptor<'c>; 
        
               type BaseDescriptor<'b>; 
        
               fn get_coeffs_descriptor<'c>(&self, coeffs: &'c [C::Scalar]) -> Self::CoeffsDescriptor<'c>; 
        
               fn get_base_descriptor<'b>(&self, base: &'b [C]) -> Self::BaseDescriptor<'b>; 
        
               fn msm_with_cached_scalars(&self, coeffs: &Self::CoeffsDescriptor<'_>, base: &[C]) -> C::Curve; 
        
               fn msm_with_cached_base(&self, coeffs: &[C::Scalar], base: &Self::BaseDescriptor<'_>) -> C::Curve; 
        
               fn msm_with_cached_inputs(&self, coeffs: &Self::CoeffsDescriptor<'_>, base: &Self::BaseDescriptor<'_>) -> C::Curve; 
        
                 // Execute MSM according to descriptors 
        
                 // Unsure of naming, msm_with_cached_inputs, msm_apply, msm_cached, msm_with_descriptors, ... 
        
           }

I don't know if caching scalars is useful, it can be removed if not.

I expect the following use cases for computations using the same inputs:

caching on accelerator device (i.e. avoiding copies)
caching Endianness conversion
caching conversion from Montgomery to Canonical form
caching conversion from Projective to Jacobian coordinates or even to a Twisted Edwards curve.
caching other form of expensive preprocessing

mratsim · 2023-12-05T20:50:49Z

One thing missing here is error handling, for example because the GPU ran out of memory or GPU device 10 is unavailable.

At Halo2 and Halo2curves level it seems to me like those are unrecoverable so I think the trait impl should just panic.

…halo2curves/pull/107\#issuecomment-1841456570

…halo2curves/pull/107\#issuecomment-1841456570 (#314)

mratsim · 2023-12-11T09:34:34Z

Looking at how Ingonyama integrated with EZKL fork of halo2, it seems like HW accel is only worth starting from a certain size threshold:

https://github.com/zkonduit/halo2/pull/3/files

pub fn should_use_cpu_msm(size: usize) -> bool {
    size <= (1 << u8::from_str_radix(&env::var("ICICLE_SMALL_K").unwrap_or("8".to_string()), 10).unwrap())
}

Instead of increasing the noise in the prover code, the msm function can do this internally to pick whether to use the builtin Halo2 MSM or HW accel.

cc @jeremyfelder

jeremyfelder · 2023-12-21T07:25:37Z

it seems like HW accel is only worth starting from a certain size threshold:

@mratsim I wouldn't say this is true across the board. For now, our MSM algorithm isn't quite as performant on smaller circuits. We do have a batching method to perform multiple MSMs in one call to GPU which is performant for smaller circuits and this env variable is toggling where that cutoff is.

Instead of increasing the noise in the prover code, the msm function can do this internally to pick whether to use the builtin Halo2 MSM or HW accel.

Its possible for the msm function implemented for a specific acceleration to do this internally and fallback to CPU if needed; we essentially did this within the EZKL integration as well.

The only caveat I'll mention is using our batch MSM usually requires integration at a level further up the call stack, although I think it is possible to bring it into the msm function as well by writing any commitments as if it will be a batch and letting the msm function dictate if it uses batch or single depending on some user defined configuration like SMALL_CIRCUIT or NUM_{ADVICE|LOOKUPS|etc}

See here for an example of moving to batched operations

mratsim · 2024-01-12T09:11:14Z

Turns out the caching API is somewhat problematic because of dyn trait, see https://rust-lang.github.io/async-fundamentals-initiative/evaluation/challenges/dyn_traits.html

In the current full PoC taikoxyz/halo2#14, I use dyn Trait so that the code is not monomorphized and if using Halo2 as a library it the ZalEngine can be changed without recompilation.

Alternatives:

Don't use &dyn MsmAccel<C> and use &impl MsmAccel<C>. Tradeoff: codesize, compiling the same code once per backend, cannot use backends as "plugins" when using Halo2 as a .dll/.so.
Remove the caching API. Tradeoff: it's a new feature so we don't lose anything. it may be possible to reintroduce it via boxing the descriptors.

…halo2curves/pull/107\#issuecomment-1888708591

CPerezz · 2024-01-17T10:15:20Z

First round of thoughts from @ed255 and me: https://hackmd.io/F6W1U6X8SLCqTciJt0nfiQ?edit

Happy to discuss more in a meeting or whatever works best

ed255 · 2024-01-19T12:42:14Z

src/zal.rs

+//! - an initialization function:
+//!   - either "fn new() -> ZalEngine" for simple libraries
+//!   - or a builder pattern for complex initializations
+//! - a shutdown function.


What about handling shutdown via Drop?

Yes, as discussed it's what makes the most sense.
In general this is a recommendation to document how the engine should be destroyed.
If the end-user doesn't need to do anything, even better.

src/zal.rs

ed255 · 2024-01-19T12:52:47Z

src/zal.rs

+    type CoeffsDescriptor<'c>;
+    type BaseDescriptor<'b>;


Could these descriptors be a non-associated type? For example:

struct CoeffsDescriptor(u64); struct BaseDescriptor(u64);

And then the engine holds a mapping between the ids and a possible internal representation of the descriptors? This way we simplify this Trait. Also, I'm not sure if associated types are compatible with trait objects, which may be a candidate on how to use MsmAccel.

It's a possibility, but that requires engines to hold state. While Rust has hashmaps in its standard library, in C or C++ it becomes annoying.

I think it's easier for now to use associated types and remove the &dyn to use &impl.

ed255 · 2024-01-19T12:59:26Z

src/zal.rs

+
+    fn get_coeffs_descriptor<'c>(&self, coeffs: &'c [C::Scalar]) -> Self::CoeffsDescriptor<'c>{
+        // Do expensive device/library specific preprocessing here
+        Self::CoeffsDescriptor { raw: coeffs }


I think it would be hard to implement this following my previous comment. I was assuming the engine would copy the coeffs and base (and possibly transform them), but in this case it's just a reference.

Maybe with a Box<dyn Descriptor<'a>>, where the Descriptor trait is empty, and then the engine just casts it to the internal type?

I think there's a really interesting article related to that! https://smallcultfollowing.com/babysteps/blog/2022/03/29/dyn-can-we-make-dyn-sized/

In any case, we might be able to not require the box if Descriptor is Copy right? Which is the only ugly part of this.

In this case it is just a reference, but it's an artifact of the H2cEngine being CPU so no copies are required.

For GPU and FPGA engines, there will be a copy to the device memory except for Apple Metal due to Unified memory.
That said, once copied we will likely still hold an (owned) reference as we would be in C FFI land.

In any case, we might be able to not require the box if Descriptor is Copy right? Which is the only ugly part of this.

For me this makes sense, I don't see a use-case for non-Copy destructors

The most complex descriptors should be: - Pointer(s) to data - size of data - data in layout (for example canonical or bit-reversed permuted) - data out layout - enum or integer ID for operation (MSM, FFT, Coset FFT) - Device(s) ID

and if those are needed, maybe those data structures should live in a hashmap in &engine, and the &engine just returns a handle ID as descriptor.

src/zal.rs

ed255 · 2024-01-19T13:59:58Z

src/zal.rs

+pub trait ZalEngine {}
+
+pub trait MsmAccel<C: CurveAffine>: ZalEngine {
+    fn msm(&self, coeffs: &[C::Scalar], base: &[C]) -> C::Curve;


The user of this trait can either use this method, or the cached version. Perhaps this method could have a default implementation that uses the cached version underneath?

I think that can be a future optimization/discussion point later with more provider feedback.

Implementers always start with the stateless msm and later when they hit bottlenecks, they implement the cached version. So they'll likely focus on MSM and have the cached version fallback on stateless msm like I did in:

halo2curves/src/zal.rs

Lines 117 to 127 in 2e6b89a

fn msm_with_cached_scalars(&self, coeffs: &Self::CoeffsDescriptor<'_>, base: &[C]) -> C::Curve {

best_multiexp(coeffs.raw, base)

}

fn msm_with_cached_base(&self, coeffs: &[C::Scalar], base: &Self::BaseDescriptor<'_>) -> C::Curve {

best_multiexp(coeffs, base.raw)

}

fn msm_with_cached_inputs(&self, coeffs: &Self::CoeffsDescriptor<'_>, base: &Self::BaseDescriptor<'_>) -> C::Curve {

best_multiexp(coeffs.raw, base.raw)

}

mratsim · 2024-02-19T15:06:36Z

Following discussion on Thursday (2024-02-15), superceded by privacy-scaling-explorations/halo2#277

mratsim mentioned this pull request Dec 4, 2023

Docs ZK Accel Layer & orgs mratsim/constantine#311

Merged

mratsim mentioned this pull request Dec 5, 2023

[RFC] Blackboxing MSM and FFT - Hardware Accel API privacy-scaling-explorations/halo2#216

Closed

mratsim referenced this pull request in mratsim/constantine Dec 5, 2023

Add caching API from https://github.com/privacy-scaling-explorations/…

c7a39a4

…halo2curves/pull/107\#issuecomment-1841456570

mratsim mentioned this pull request Dec 5, 2023

Rust: Add tentative caching API for ZAL mratsim/constantine#314

Merged

mratsim referenced this pull request in mratsim/constantine Dec 5, 2023

Add caching API from https://github.com/privacy-scaling-explorations/…

eebe34e

…halo2curves/pull/107\#issuecomment-1841456570 (#314)

mratsim force-pushed the pr-pse-exec-engine branch from 014cf86 to 2e6b89a Compare December 11, 2023 09:29

mratsim referenced this pull request in taikoxyz/halo2curves Jan 12, 2024

disable caching API: https://github.com/privacy-scaling-explorations/…

60b4b70

…halo2curves/pull/107\#issuecomment-1888708591

mratsim referenced this pull request in mratsim/constantine Jan 12, 2024

disable caching API: https://github.com/privacy-scaling-explorations/…

7f24d40

…halo2curves/pull/107\#issuecomment-1888708591

ed255 reviewed Jan 19, 2024

View reviewed changes

mratsim added 4 commits January 22, 2024 11:48

Zal PoC

0e5fea0

zal: fix visibility

48d1640

cargo fmt

8ce5558

zal: Add a caching API

542bea1

mratsim force-pushed the pr-pse-exec-engine branch from 2e6b89a to 542bea1 Compare January 22, 2024 10:48

Remove ZalEngine Trait + Descriptors are Copy

9e9a186

NOOMA-42 mentioned this pull request Feb 8, 2024

Self Proposed Open Task: Implement a polynomial repo integrating Icicle and sppark privacy-scaling-explorations/acceleration-program#32

Open

cyl19970726 mentioned this pull request Feb 20, 2024

Proposal: Implement msmAccel trait for sppark-msm and icicle-msm privacy-scaling-explorations/acceleration-program#35

Closed

mratsim closed this Feb 20, 2024

mratsim mentioned this pull request Feb 23, 2024

ZAL: ZK Accel Layer privacy-scaling-explorations/halo2#277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZK Abstraction Layer - PoC #107

ZK Abstraction Layer - PoC #107

mratsim commented Nov 29, 2023 •

edited by kilic

Loading

mratsim commented Dec 4, 2023

mratsim commented Dec 5, 2023

mratsim commented Dec 5, 2023 •

edited

Loading

mratsim commented Dec 11, 2023

jeremyfelder commented Dec 21, 2023

mratsim commented Jan 12, 2024

CPerezz commented Jan 17, 2024

ed255 Jan 19, 2024

mratsim Jan 22, 2024

ed255 Jan 19, 2024

mratsim Jan 22, 2024

ed255 Jan 19, 2024

CPerezz Jan 19, 2024

mratsim Jan 22, 2024 •

edited

Loading

mratsim Jan 22, 2024

ed255 Jan 19, 2024

mratsim Jan 22, 2024

mratsim commented Feb 19, 2024

	fn msm_with_cached_scalars(&self, coeffs: &Self::CoeffsDescriptor<'_>, base: &[C]) -> C::Curve {
	best_multiexp(coeffs.raw, base)
	}

	fn msm_with_cached_base(&self, coeffs: &[C::Scalar], base: &Self::BaseDescriptor<'_>) -> C::Curve {
	best_multiexp(coeffs, base.raw)
	}

	fn msm_with_cached_inputs(&self, coeffs: &Self::CoeffsDescriptor<'_>, base: &Self::BaseDescriptor<'_>) -> C::Curve {
	best_multiexp(coeffs.raw, base.raw)
	}

ZK Abstraction Layer - PoC #107

ZK Abstraction Layer - PoC #107

Conversation

mratsim commented Nov 29, 2023 • edited by kilic Loading

Out-of-scope

mratsim commented Dec 4, 2023

mratsim commented Dec 5, 2023

mratsim commented Dec 5, 2023 • edited Loading

mratsim commented Dec 11, 2023

jeremyfelder commented Dec 21, 2023

mratsim commented Jan 12, 2024

CPerezz commented Jan 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mratsim Jan 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mratsim commented Feb 19, 2024

mratsim commented Nov 29, 2023 •

edited by kilic

Loading

mratsim commented Dec 5, 2023 •

edited

Loading

mratsim Jan 22, 2024 •

edited

Loading