Audit seahash #66

chills42 · 2020-02-25T13:34:10Z

Seahash is an implementation of a fast non-cryptographic hashing algorithm, comparable to xxHash or MetroHash.

https://gitlab.redox-os.org/redox-os/seahash

nico-abram · 2021-02-16T02:03:00Z

From a cursory look, it seems to have 5 unsafe blocks:

unaligned reads to turn a &[u8] into a u64
This looks fine to me
Maybe the big unsafe block could be turned into smaller blocks inside each match arm, and maybe the byte-sized reads can be turned into slice indexing operations instead of raw pointer reads (Since rustc might be able to elide the bounds checks, but I'm not sure)
It correctly uses as_ptr to get a pointer to the entire slice instead of a pointer to the first element, and it returns 0 for anything that does not satisfy the buf.len() < 8 assumption without triggering UB. It also correctly uses unaligned reads
Read a u64 from a *const u8
This also looks sound to me. It's an unsafe fn, although pherhaps the requirement that the pointer points to 8 bytes of data could be specified more explicitly in the doc comment for it
It correctly uses unaligned reads
A test case for (2)
Not much to say about this, looks fine
A 121 line unsafe block in State::hash
This has two parts:
First it iterates through the buffer using end and current raw pointers and reading 4 u64's using the function from (2). It obtains the 32-byte ending chunk pointer by doing buf.as_ptr().offset(buf.len() as isize & !0x1F) which looks fine to me but I would like someone else to double check. The u64 reads seem fine since (2) uses unaligned reads
After that it handles the remainder
It calculates the remaining bytes by doing let mut excessive = buf.len() as usize + buf.as_ptr() as usize - end_ptr as usize;. I'm assuming this is fine but I think it would be more clear if it has parenthesis around the addition to make it a bit more clear that this never underflows
Then it just hashes the remaining bytes using the functions from (1) and (2). As far as I can tell, all the match arms are sound.
A 97 line unsafe block in SeaHasher::push_bytes
I did not look into this. At a glance it seems to mostly consist of a copy_to_nonoverlapping call and then something similar to what's done in (4)

It does not have any dependencies, so there is nothing to audit with regards to deps

I also ran the tests under miri and they all passed

In summary, as far as I can tell the crate is sound (Outside of (5) which I did not look into), but can probably be improved a bit with comments and smaller unsafe blocks. Another thing we might wanna do is check if we can replace unsafe code with safe alternatives without performance regressions. The crate has criterion benchmarks which should make that easier to attempt

Shnatsel · 2021-02-16T15:10:51Z

I'm pretty sure point 1 can be refactored using u32::from_ne_bytes(buf[..4].try_into().unwrap()) instead of unaligned reads. Since the length is known in advance, the optimizer should elide bounds checks and the unreachable panic in .unwrap().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit seahash #66

Audit seahash #66

chills42 commented Feb 25, 2020

nico-abram commented Feb 16, 2021 •

edited

Loading

Shnatsel commented Feb 16, 2021

Audit seahash #66

Audit seahash #66

Comments

chills42 commented Feb 25, 2020

nico-abram commented Feb 16, 2021 • edited Loading

Shnatsel commented Feb 16, 2021

nico-abram commented Feb 16, 2021 •

edited

Loading