Further NTT optimizations #1011

divergentdave · 2024-04-23T18:15:22Z

Runtime of Prio3 is largely dominated by NTT, INTT, and ancillary field multiplications. We could benefit from adapting techniques from the literature for implementing the NTT over smaller fields for Ring-LWE cryptosystems (and in turn from digital signal processing literature on efficient FFT implementation).

I skimmed over Speeding up the Number Theoretic Transform for Faster Ideal Lattice-Based Cryptography, and there were a few ideas from this paper and previous works it cites that could be applicable. For example, one could make the forward transform return its results in bit-reversed order, and have the inverse transform take its input in bit-reversed order. We could probably adopt this for polynomial multiplication, in order to skip two bit reversed reordering steps. Other uses of the NTT may need to be adapted to this change. We can't use the loose bound/tight bound representations of field elements, (in order to skip modular reduction after additions) because our field modulus is already too close to machine word sizes.

There was also a talk at RWC 2024 on RISC V implementations of cryptographic algorithms, (slides, recording) this included a section on applying autovectorization to NTT implementations. I'm not sure if our field multiplications would be too long for the compiler to see through, but the LLVM-related diagnostic tooling might be generally applicable to our case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further NTT optimizations #1011

Further NTT optimizations #1011

divergentdave commented Apr 23, 2024

Further NTT optimizations #1011

Further NTT optimizations #1011

Comments

divergentdave commented Apr 23, 2024