-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed-up multiplication by small integers, and improve lookup part of compute_quotient_poly() #1153
Conversation
Thanks for this @Nashtare, I'll have a look at this in detail tomorrow. So I can compare when I test it myself, could you please post what concrete performance improvement you observed and how you measured it? |
@unzvfu The main gains are with respect to compute_quotient_poly() when lookups are involved. For this I modified the bench_recursion example to have the third case build a circuit of As for the rest, the improvements are minimal as not part of some bottleneck for the prover. For instance I added a local logger on the evm side for the step computing |
I'd also be curious in seeing how much of a difference |
@Nashtare: Is there a way you can share your benchmark code with me (is it available as a branch in your repo)? I agree with Jacqui that I would have expected the compiler to take care of many of the instances where Could you please also refactor all the code that was copy-pasted from |
Ah, interesting, in which case indeed most of the changes in this PR are useless 😅 I wouldn't have expected the compiler to optimize full-reduction that well with smaller inputs. @unzvfu I've added a |
Ok, so I've finally got what I believe is a fair comparison for "compute quotient polys", but unfortunately I don't see any performance difference. Could you please verify that I'm checking the right thing though? Here are two example runs; first
and then
Could you confirm that you are compiling with |
Hmm that's weird, I definitely got variable performance improvements (also tried on larger tables). I was compiling with On the other hand, from playing with godbolt, it seems Jacqueline was right and that the compiler can properly infer small enough inputs when performing modular reduction, making the mul_u32() method somewhat redundant. Due to this latter point, and the fact that the quotient poly improvement seems somehow not consistent, should we just close this? |
If you're happy to close, that's okay with me. Also happy to revisit the idea later if you have an idea of how it might be made to (more definitively) bear fruit. |
Yeah let's close it for now. The quotient_poly part wasn't the biggest overhead with lookups anyway. We're working internally on making those more efficient to prove, we should come up with something relatively soon! |
This PR introduces a
mul_u32()
method forField
, to be used to speed-up multiplication by small values, though funnily the places where I was originally planning to use it ended up leveragingF::from_noncanonical_u96()
instead. The gain incompute_quotient_poly()
when lookups are involved is pretty important.Additionally, it speeds-up places where the multiplicative generator is involved, as that one is always small for prime fields. Namely, I've added a
SmallPowers
struct (mimickingPowers
), and used it wherever possible. Note that the fft speed-ups can't be integrated directly in the fft module as it is defined overField
for which we cannot convert to primitive types.@nbgl I think there may be some improvements to get by having a similar method for
PackedField
, however I am not really familiar with AVX instructions. I added a blanket impl for now, with TODOs for Goldilocks specialization.closes #869