Unroll for loops with integer literal bounds using an attribute-like macro.
This crate provides a attribute-like macro unroll_for_loops
that can be applied to functions. This
macro looks for loops to unroll and unrolls them at compile time.
There is already a crate called Crunchy that unrolls loops with
a macro_rules!
macro. This crate provides an alternative solution for unrolling loops using an
attribute-like macro instead. Some benefits of an item based attribute-like macro for unrolling
loops are:
- Macro invocation doesn't pollute the algorithm with annotations. Only one annotation is needed for each function.
- Can use variables defined outside of the macro invocation.
- Can be extended to do partial loop unrolling
Just add #[unroll_for_loops]
above the function whose for loops you would like to unroll.
Currently all for loops with integer literal bounds will be unrolled, although this macro currently
can't see inside complex code (e.g. for loops within closures).
The following example demonstrates the kinds of code that can be unrolled automatically.
#[unroll_for_loops]
fn main() {
println!("matrix = ");
for i in 0..10 {
for j in 0..10 {
print!("({:?}, {:?})", i, j);
}
println!("");
}
}
Of course not all code benefits from unrolling. For instance the code above is bound by I/O, and hence likely wont benefit from unrolling.
On the other hand, computational tasks, such as matrix multiplication, can benefit significantly as demonstrated by this simple example of a matrix-vector product:
#[unroll_for_loops]
fn mtx_vec_mul(mtx: &[[f64; 5]; 5], vec: &[f64; 5]) -> [f64; 5] {
let mut out = [0.0; 5];
for col in 0..5 {
for row in 0..5 {
out[row] += mtx[col][row] * vec[col];
}
}
out
}
A quick benchmark of this code generated by criterion shows a 3x performance increase with loop unrolling:
Matrix-Vector Product/Ordinary
time: [2.7254 ns 2.7657 ns 2.8153 ns]
Matrix-Vector Product/Unrolled
time: [878.71 ps 882.96 ps 888.01 ps]
The relative performance difference can be visualized in the violin plot:
There are also places where this macro has no benefit where one might be expected. For instance, when computing a sum of elements in a large array, one might break the computation into independent blocks to allow the CPU to perform out of order execution optimizations. For example the following code:
fn sum(data: &[f64]) -> f64 {
let mut sum = 0.0;
for i in 0..data.len() {
sum += data[i];
}
sum
}
can get a substantial boost in performance when computed in chunks as:
fn sum(data: &[f64]) -> f64 {
let mut sum = [0.0; 32];
for i in (0..data.len()).step_by(32) {
for j in 0..32 {
sum[j] += data[i + j];
}
}
sum.iter().sum()
}
However this example is unlikely to benefit from the unroll_for_loops
macro. A
benchmark for unrolling sums is included for verification.
Having said this, a macro that can partially unroll loops as shown above can be very useful and is left for future work.
Contributions are welcome!
I would like to thank the author of Crunchy for the initial great solution to this problem.
This repository is licensed under either of
- Apache License, Version 2.0, (LICENSE-APACHE or https://www.apache.org/licenses/LICENSE-2.0)
- MIT License (LICENSE-MIT or https://opensource.org/licenses/MIT)
at your option.