-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZK Abstraction Layer - PoC #107
ZK Abstraction Layer - PoC #107
Conversation
I can confirm that the current ZAL API proposal works with a non-Rust ZAL backend, in this case Constantine I've created a document for accelerator providers that outline out to integrate halo2curves ZAL with a C backend: https://github.com/mratsim/constantine/blob/master/docs/zk_accel_layer.md |
I've added a caching API for discussion with opaque handles to the coefficients and base of the MSM. Lines 42 to 81 in 014cf86
I don't know if caching scalars is useful, it can be removed if not. I expect the following use cases for computations using the same inputs:
|
One thing missing here is error handling, for example because the GPU ran out of memory or GPU device 10 is unavailable. At Halo2 and Halo2curves level it seems to me like those are unrecoverable so I think the trait impl should just panic. |
014cf86
to
2e6b89a
Compare
Looking at how Ingonyama integrated with EZKL fork of halo2, it seems like HW accel is only worth starting from a certain size threshold: https://github.com/zkonduit/halo2/pull/3/files pub fn should_use_cpu_msm(size: usize) -> bool {
size <= (1 << u8::from_str_radix(&env::var("ICICLE_SMALL_K").unwrap_or("8".to_string()), 10).unwrap())
} Instead of increasing the noise in the prover code, the |
@mratsim I wouldn't say this is true across the board. For now, our MSM algorithm isn't quite as performant on smaller circuits. We do have a batching method to perform multiple MSMs in one call to GPU which is performant for smaller circuits and this env variable is toggling where that cutoff is.
Its possible for the The only caveat I'll mention is using our batch MSM usually requires integration at a level further up the call stack, although I think it is possible to bring it into the See here for an example of moving to batched operations |
Turns out the caching API is somewhat problematic because of In the current full PoC taikoxyz/halo2#14, I use Alternatives:
|
First round of thoughts from @ed255 and me: https://hackmd.io/F6W1U6X8SLCqTciJt0nfiQ?edit Happy to discuss more in a meeting or whatever works best |
src/zal.rs
Outdated
//! - an initialization function: | ||
//! - either "fn new() -> ZalEngine" for simple libraries | ||
//! - or a builder pattern for complex initializations | ||
//! - a shutdown function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about handling shutdown via Drop
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as discussed it's what makes the most sense.
In general this is a recommendation to document how the engine should be destroyed.
If the end-user doesn't need to do anything, even better.
src/zal.rs
Outdated
type CoeffsDescriptor<'c>; | ||
type BaseDescriptor<'b>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these descriptors be a non-associated type? For example:
struct CoeffsDescriptor(u64);
struct BaseDescriptor(u64);
And then the engine holds a mapping between the ids and a possible internal representation of the descriptors? This way we simplify this Trait. Also, I'm not sure if associated types are compatible with trait objects, which may be a candidate on how to use MsmAccel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a possibility, but that requires engines to hold state. While Rust has hashmaps in its standard library, in C or C++ it becomes annoying.
I think it's easier for now to use associated types and remove the &dyn
to use &impl
.
|
||
fn get_coeffs_descriptor<'c>(&self, coeffs: &'c [C::Scalar]) -> Self::CoeffsDescriptor<'c>{ | ||
// Do expensive device/library specific preprocessing here | ||
Self::CoeffsDescriptor { raw: coeffs } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be hard to implement this following my previous comment. I was assuming the engine would copy the coeffs and base (and possibly transform them), but in this case it's just a reference.
Maybe with a Box<dyn Descriptor<'a>>
, where the Descriptor
trait is empty, and then the engine just casts it to the internal type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a really interesting article related to that! https://smallcultfollowing.com/babysteps/blog/2022/03/29/dyn-can-we-make-dyn-sized/
In any case, we might be able to not require the box if Descriptor is Copy
right? Which is the only ugly part of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it is just a reference, but it's an artifact of the H2cEngine being CPU so no copies are required.
For GPU and FPGA engines, there will be a copy to the device memory except for Apple Metal due to Unified memory.
That said, once copied we will likely still hold an (owned) reference as we would be in C FFI land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, we might be able to not require the box if Descriptor is
Copy
right? Which is the only ugly part of this.
For me this makes sense, I don't see a use-case for non-Copy
destructors
The most complex descriptors should be:
- Pointer(s) to data
- size of data
- data in layout (for example canonical or bit-reversed permuted)
- data out layout
- enum or integer ID for operation (MSM, FFT, Coset FFT)
- Device(s) ID
and if those are needed, maybe those data structures should live in a hashmap in &engine, and the &engine just returns a handle ID as descriptor.
pub trait ZalEngine {} | ||
|
||
pub trait MsmAccel<C: CurveAffine>: ZalEngine { | ||
fn msm(&self, coeffs: &[C::Scalar], base: &[C]) -> C::Curve; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user of this trait can either use this method, or the cached version. Perhaps this method could have a default implementation that uses the cached version underneath?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that can be a future optimization/discussion point later with more provider feedback.
Implementers always start with the stateless msm
and later when they hit bottlenecks, they implement the cached version. So they'll likely focus on MSM and have the cached version fallback on stateless msm
like I did in:
Lines 117 to 127 in 2e6b89a
fn msm_with_cached_scalars(&self, coeffs: &Self::CoeffsDescriptor<'_>, base: &[C]) -> C::Curve { | |
best_multiexp(coeffs.raw, base) | |
} | |
fn msm_with_cached_base(&self, coeffs: &[C::Scalar], base: &Self::BaseDescriptor<'_>) -> C::Curve { | |
best_multiexp(coeffs, base.raw) | |
} | |
fn msm_with_cached_inputs(&self, coeffs: &Self::CoeffsDescriptor<'_>, base: &Self::BaseDescriptor<'_>) -> C::Curve { | |
best_multiexp(coeffs.raw, base.raw) | |
} |
2e6b89a
to
542bea1
Compare
Following discussion on Thursday (2024-02-15), superceded by privacy-scaling-explorations/halo2#277 |
PoC branch of the ZK Abstraction Layer.
Next steps:
Out-of-scope
We keep the PoC simple and effective so that it can serve as a basis for discussion:
Hence the following are out-of-scope: