Bernstein yang modular multiplicative inverter #2

AlekseiVambol · 2023-08-21T13:39:19Z

Replacing the multiplicative inversion based on the Little Fermat Theorem with Bernstein yang multiplicative inversion. The latter one turned out to be about 8.5 times faster on average.

johntaiko · 2023-08-22T00:36:25Z

Hello, aleksei, I found your code has not been formatted.
cargo fmt

mratsim

Only comments to improve in general.

A good addition would be an inversion benchmark here https://github.com/taikoxyz/halo2curves/blob/main/benches/bn256_field.rs

mratsim · 2023-08-23T06:47:45Z

src/bernsteinyang.rs

+struct ChunkInt<const B:usize, const L:usize>(pub [u64; L]);
+
+impl<const B:usize, const L:usize> ChunkInt<B,L> {
+    /// Mask, in which the B lowest bits are 1 and only they


"and only they"

Seems like the sentence is cut.

No, it means "they and only they are 1".

mratsim · 2023-08-23T06:52:37Z

src/bernsteinyang.rs

+/// The ordering of the chunks in these arrays is little-endian.
+/// The arithmetic operations for this type are wrapping ones.
+#[derive(Clone)]
+struct ChunkInt<const B:usize, const L:usize>(pub [u64; L]);


I think it's usually called unsaturated integers see:

golang/go@b0c49ae

an unsaturated 51-bit limb field implementation optimized for 64-bit
architectures and math/bits.Mul64 intrinsics

https://github.com/mit-plv/fiat-crypto/tree/42e5455a3f95ee4f739245e75b667e0639beac51#usage-generating-c-files

There is a separate compiler binary for each implementation strategy:

saturated_solinas

unsaturated_solinas

word_by_word_montgomery

It seems that this term is rarely used. On the other hand, "saturated integer" is usually used to describe the type, for which the saturation arithmetic is implemented: https://en.wikipedia.org/wiki/Saturation_arithmetic A least, my first thought was about the saturation arithmetic, when I saw this term somewhere before. Thus, I will leave this comment "as is" to avoid the ambiguation.

mratsim · 2023-08-23T06:55:47Z

src/bernsteinyang.rs

+            data[i - 1] = self.0[i];
+        }
+        if self.is_negative() {
+            data[L - 1] = Self::MASK;


~~does the mask in data[L - 2] need to be cancelled?~~

A comment is needed, at least to attract attention for a potential bug during review/audit and also test input design.

Edit: Ah I understand the representation now. All limbs are 2-complement and negated, not just the most significant word.

To dispel any doubts, I perform this:

mratsim · 2023-08-23T07:01:11Z

src/bernsteinyang.rs

+        let (mut data, mut carry) = ([0; L], 0);
+        for i in 0..L {
+            let sum = self.0[i] + other.0[i] + carry;
+            data[i] = sum & ChunkInt::<B,L>::MASK;


What if the sum is negative? You remove the negative tag there.

The value of "sum" is never negative, since actually I operate on 2 non-negative integers in [0 .. 2 ^ (B * L) - 1] splited into B-bit non-negative chunks, but any x in [2 ^ (B * L - 1) .. 2 ^ (B * L) - 1] is considered to be the representation of -|2 ^ (B * L) - x|. The Mul, Add and Sub algorithms for the two's complement code do not care about the sign, and this is the main reason for them being used by the majority of processors. Since ChunkInt has a fixed size, it may afford to use this convenient code.

mratsim · 2023-08-23T07:02:57Z

src/bernsteinyang.rs

+impl<const B:usize, const L:usize> Sub for &ChunkInt<B,L> {
+    type Output = ChunkInt<B,L>;
+    fn sub(self, other: Self) -> Self::Output {
+        let (mut data, mut carry) = ([0; L], 1);


This needs a comment to explain that

-x = flip the bits of x and add 1.

Agree. This part of code has not been commented yet.

mratsim · 2023-08-23T07:18:14Z

src/bernsteinyang.rs

+            }
+
+            let mask = (1 << steps.min(1 - delta).min(4)) - 1;
+            let w = (g as i64).wrapping_mul(f.wrapping_mul(3) ^ 12) & mask;


This needs a comment in the vein of

Find the multiple of f to add to cancel the bottom min(steps, 4) bits of g

Agree. This part of code has not been commented yet; "f.wrapping_mul(3) ^ 12" will also been explained a bit.

mratsim · 2023-08-23T07:22:20Z

src/bernsteinyang.rs

+        (cd.shift(), ce.shift())   
+    }
+
+    fn norm(&self, mut value: ChunkInt<B,L>, negate: bool) -> ChunkInt<B,L> {


This should indicate the input range and output range

AFAIK:

Compute a = sign*a (mod M)

with a in range (-2*M, M)
result in range [0, M)

also why is the value mutated and returned as well?

Agree. I was going to comment this part of the code.

also why is the value mutated and returned as well?

It is mutable in order to mutate it within the function without introducing a new temporary variable.
It is returned, because the function takes the ownership of the argument instead of borrowing it mutably.
It is returned instead of being changed using a mutable reference because I want all function to be pure: https://en.wikipedia.org/wiki/Pure_function

mratsim · 2023-08-23T07:23:15Z

src/bernsteinyang.rs

+    }
+
+    /// Returns the multiplicative inverse of the argument modulo 2^B. The implementation is based
+    /// on the Hurchalla's method for computing the multiplicative inverse modulo a power of two


Missing link to paper

Usually I try to avoid overwhelming my code with direct links to papers easy to find, since I try to achieve better readability and aesthetics) However, I can add this one.

mratsim · 2023-08-23T07:24:43Z

src/derive/field.rs

@@ -24,6 +24,11 @@ macro_rules! field_common {
        $r2:ident,
        $r3:ident
    ) => {
+        /// Bernstein-Yang modular multiplicative inverter created for the modulus equal to
+        /// the characteristic of the field to invert positive integers in the Montgomery form.
+        const BYINVERTOR: crate::bernsteinyang::BYInverter<62,6> = 


for BN254 and secp256k1, BYInverter<62,5> is sufficient.

No. I have checked this both experimentally (running the code for the bn254 Fq modulus with BYInverter<62,5>) and theoretically (deriving the formula 2 ^ (B * (L - 1) - 2) for the threshold for the values of the modulus and the argument).

mratsim · 2023-08-23T07:29:27Z

src/bernsteinyang.rs

+        let mut matrix;
+        while g != ChunkInt::ZERO {
+            (delta, matrix) = Self::step(&f, &g, delta);
+            (f, g) = Self::fg(f, g, matrix);


Comment: fg updates can take a parameter "limbsLeft" to avoid iterating on the full L all the time.

This might be left as a future optimization.

See: https://github.com/mratsim/constantine/blob/f57d071f1192a4039979a3baf6c835b89841bcfa/constantine/math/arithmetic/limbs_exgcd.nim#L836-L839

I thought about this during planing the implementation, but decided not to do this, because:

It will require some extra time to determine the number of unused limbs after each arithmetic operation;

Only the f,g-update can benefit from it;

For the two's complement code working with fixed-length register-like structure is much more convenient.

For much larger modulus this optimization makes sense, but I am not sure that it is needed here.

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." - Donald Knuth

AlekseiVambol · 2023-08-28T10:54:14Z

Update summary:

I have read that the Montgomery's formula (3 * x) xor 2 = 1 / x for an odd integer x in the two's complement code holds true not only (mod 16), but also (mod 32). Thus, I have used it to derive the formula (3 * x) xor 28 = -1 / x (mod 32) and integrated it into the method for computing the transition matrix in order to nullify the 5 lowest bits instead of the 4 ones. This increased the overall performance by 2-3%.
The number of the Bernstein-Yang method's basic steps, which the transition matrix is computed for, has been hard coded as 62.
The test bench for multiplicative inversion in the base field of the bn256 curve has been added.
Commenting has been accomplished.
Some minor changes proposed by the Clippy tool has been made.
The standard code formatting has been done.

The current implementation of finite field multiplicative inversion is approximately 9.4 times faster than the initial one based on the Fermat's Little Theorem:

mratsim

Awesome, I think the src filename should be "invmod.rs" or "modinv.rs" instead of Bernstein-Yang (i.e. what it does instead of how it does it), but the PR is in good shape and it will likely take a very long-time to review for the PSE team so better merge now and address their concerns.

…s#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <[email protected]>

AlekseiVambol added 5 commits August 21, 2023 16:17

Add files via upload

30f5259

Add files via upload

f81a9d9

Add files via upload

4a501e9

Add files via upload

687f55f

Add files via upload

156ae51

mratsim self-requested a review August 22, 2023 13:29

mratsim mentioned this pull request Aug 23, 2023

Avoid computing square roots where possible in hash_to_curve privacy-scaling-explorations/halo2curves#73

Closed

mratsim approved these changes Aug 23, 2023

View reviewed changes

AlekseiVambol added 6 commits August 28, 2023 13:02

Add files via upload

7e0b17e

Add files via upload

fd01ecf

Add files via upload

a1476bb

Add files via upload

0fb5b8f

Add files via upload

5e36016

Add files via upload

62e2846

mratsim approved these changes Aug 29, 2023

View reviewed changes

mratsim merged commit ba112ff into main Aug 29, 2023
7 checks passed

mratsim mentioned this pull request Aug 29, 2023

Fast modular inverse - 9.4x acceleration privacy-scaling-explorations/halo2curves#83

Merged

mratsim pushed a commit that referenced this pull request Sep 6, 2023

Bernstein yang modular multiplicative inverter (#2)

5b0e99e

mratsim pushed a commit that referenced this pull request Oct 13, 2023

Bernstein yang modular multiplicative inverter (#2)

e0587f4

mratsim mentioned this pull request Nov 2, 2023

[Experiment] Merge back upstream changes for #2 and #3 #4

Merged

mratsim deleted the Bernstein-Yang-modular-multiplicative-inverter branch November 2, 2023 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bernstein yang modular multiplicative inverter #2

Bernstein yang modular multiplicative inverter #2

AlekseiVambol commented Aug 21, 2023

johntaiko commented Aug 22, 2023

mratsim left a comment

mratsim Aug 23, 2023

AlekseiVambol Aug 23, 2023

mratsim Aug 23, 2023

AlekseiVambol Aug 23, 2023

mratsim Aug 23, 2023

AlekseiVambol Aug 23, 2023 •

edited

Loading

mratsim Aug 23, 2023

AlekseiVambol Aug 23, 2023 •

edited

Loading

mratsim Aug 23, 2023

AlekseiVambol Aug 23, 2023

mratsim Aug 23, 2023

AlekseiVambol Aug 23, 2023

mratsim Aug 23, 2023

AlekseiVambol Aug 23, 2023

mratsim Aug 23, 2023

AlekseiVambol Aug 23, 2023

mratsim Aug 23, 2023

AlekseiVambol Aug 23, 2023

mratsim Aug 23, 2023

AlekseiVambol Aug 23, 2023 •

edited

Loading

AlekseiVambol commented Aug 28, 2023 •

edited

Loading

mratsim left a comment

Bernstein yang modular multiplicative inverter #2

Bernstein yang modular multiplicative inverter #2

Conversation

AlekseiVambol commented Aug 21, 2023

johntaiko commented Aug 22, 2023

mratsim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlekseiVambol Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlekseiVambol Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlekseiVambol Aug 23, 2023 • edited Loading

Choose a reason for hiding this comment

AlekseiVambol commented Aug 28, 2023 • edited Loading

mratsim left a comment

Choose a reason for hiding this comment

AlekseiVambol Aug 23, 2023 •

edited

Loading

AlekseiVambol Aug 23, 2023 •

edited

Loading

AlekseiVambol Aug 23, 2023 •

edited

Loading

AlekseiVambol commented Aug 28, 2023 •

edited

Loading