You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Was skimming the code, and couldn't find any sped up routine for multiplying by the primitive root for the goldilocks field.
I believe this is an operation that should be getting done on O(N) points for an N-element proof, due to taking a LDE of the input onto a coset. (Or having the codeword be a coset, though my recollection is thats actually slower)
The goldilocks field's smallest generator is 7, so should have a faster method of computing it via additions. If theres a faster mul_by_2 routine on this field, then it could be (roughly):
Unknown how many times in the code you actually multiply by the primitive root, or how much of a speedup factor this would be in the small field, but thought I'd mention it in case it helps. (In Arkworks for BLS12-381, I at some point thought this would be more than 2x as fast as the general multiplication. But thats a much larger field, and because multiplying by two was twice as fast as normal addition)
The text was updated successfully, but these errors were encountered:
I haven't benchmarked it in the context of plonky2, so my comment is to be taken with a pinch of salt, but with this field size, the ratio between addition and multiplication is much smaller than with large fields (like on BLS12-381 for instance where it is around 4x/5x if I recall correctly?). In our curve implementation, the ratio is around 1.4, so I doubt such approach would yield any advantage, but maybe my assumptions are wrong with how arithmetic is performed here, especially with assembly instructions..
EDIT: To complete, in our approach, multiplying by the generator (or any u32 value) is done through a specific mul_by_u32() method which mimics the reduction of regular field multiplication, but ignores the highest limb (which is 0 here), effectively being faster than regular multiplication.
Yeah seems like a good idea to special case 7. As Robin said it might be best to still use multiplication, but the reduction should be somewhat faster.
I think a 96-bit reduction would have the same cost as a 67-bit reduction (cc @nbgl, correct me if wrong), so we could implement mul_by_u32 and it might have other uses.
We used to do something like that for our (now deleted) CrandallField, and there's still a from_noncanonical_u96 method in the code, though it's not implemented yet for GoldilocksField.
I don't think multiplication by a generator is a significant cost for us so we might not immediately notice a speedup, but it could add up if we also used mul_by_u32 when evaluating certain constraints etc.
Was skimming the code, and couldn't find any sped up routine for multiplying by the primitive root for the goldilocks field.
I believe this is an operation that should be getting done on O(N) points for an N-element proof, due to taking a LDE of the input onto a coset. (Or having the codeword be a coset, though my recollection is thats actually slower)
The goldilocks field's smallest generator is
7
, so should have a faster method of computing it via additions. If theres a fastermul_by_2
routine on this field, then it could be (roughly):Unknown how many times in the code you actually multiply by the primitive root, or how much of a speedup factor this would be in the small field, but thought I'd mention it in case it helps. (In Arkworks for BLS12-381, I at some point thought this would be more than 2x as fast as the general multiplication. But thats a much larger field, and because multiplying by two was twice as fast as normal addition)
The text was updated successfully, but these errors were encountered: