This document tracks which optimizations have been done after the initial implementation passed corpus tests and a good amount of fuzzing.
These optimizations introduced more unsafe code. These should yield significant improvements, or else they are not really worth it.
-
Reverse bitreader_reversed::get_bits was identified by linux perf tool using about 36% of the whole time
-
Benchmark: decode enwik9
-
Before: about 14.7 seconds
-
After: about 12.2 seconds with about 25% of the time used for get_bits()
-
decodebuffer::repeate was identified by linux perf tool using about 28% of the whole time
-
Benchmark: decode enwik9
-
Before: about 9.9 seconds
-
After: about 9.4 seconds
The decode buffer must be able to do two things efficiently
- Collect bytes from the front
- Copy bytes from the contents to the end
The stdlibs VecDequeu and Vec can each do one but not the other efficiently. So a custom implementation of a ringbuffer was written.
These are just nice to have
Studying this material lead to a big improvement in bitreader speed