You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Runtime of Prio3 is largely dominated by NTT, INTT, and ancillary field multiplications. We could benefit from adapting techniques from the literature for implementing the NTT over smaller fields for Ring-LWE cryptosystems (and in turn from digital signal processing literature on efficient FFT implementation).
I skimmed over Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography, and there were a few ideas from this paper and previous works it cites that could be applicable. For example, one could make the forward transform return its results in bit-reversed order, and have the inverse transform take its input in bit-reversed order. We could probably adopt this for polynomial multiplication, in order to skip two bit reversed reordering steps. Other uses of the NTT may need to be adapted to this change. We can't use the loose bound/tight bound representations of field elements, (in order to skip modular reduction after additions) because our field modulus is already too close to machine word sizes.
There was also a talk at RWC 2024 on RISC V implementations of cryptographic algorithms, (slides, recording) this included a section on applying autovectorization to NTT implementations. I'm not sure if our field multiplications would be too long for the compiler to see through, but the LLVM-related diagnostic tooling might be generally applicable to our case.
The text was updated successfully, but these errors were encountered:
Runtime of Prio3 is largely dominated by NTT, INTT, and ancillary field multiplications. We could benefit from adapting techniques from the literature for implementing the NTT over smaller fields for Ring-LWE cryptosystems (and in turn from digital signal processing literature on efficient FFT implementation).
I skimmed over Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography, and there were a few ideas from this paper and previous works it cites that could be applicable. For example, one could make the forward transform return its results in bit-reversed order, and have the inverse transform take its input in bit-reversed order. We could probably adopt this for polynomial multiplication, in order to skip two bit reversed reordering steps. Other uses of the NTT may need to be adapted to this change. We can't use the loose bound/tight bound representations of field elements, (in order to skip modular reduction after additions) because our field modulus is already too close to machine word sizes.
There was also a talk at RWC 2024 on RISC V implementations of cryptographic algorithms, (slides, recording) this included a section on applying autovectorization to NTT implementations. I'm not sure if our field multiplications would be too long for the compiler to see through, but the LLVM-related diagnostic tooling might be generally applicable to our case.
The text was updated successfully, but these errors were encountered: