Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit seahash #66

Open
chills42 opened this issue Feb 25, 2020 · 2 comments
Open

Audit seahash #66

chills42 opened this issue Feb 25, 2020 · 2 comments

Comments

@chills42
Copy link

Seahash is an implementation of a fast non-cryptographic hashing algorithm, comparable to xxHash or MetroHash.

https://gitlab.redox-os.org/redox-os/seahash

@nico-abram
Copy link

nico-abram commented Feb 16, 2021

From a cursory look, it seems to have 5 unsafe blocks:

  1. unaligned reads to turn a &[u8] into a u64
    This looks fine to me
    Maybe the big unsafe block could be turned into smaller blocks inside each match arm, and maybe the byte-sized reads can be turned into slice indexing operations instead of raw pointer reads (Since rustc might be able to elide the bounds checks, but I'm not sure)
    It correctly uses as_ptr to get a pointer to the entire slice instead of a pointer to the first element, and it returns 0 for anything that does not satisfy the buf.len() < 8 assumption without triggering UB. It also correctly uses unaligned reads
  2. Read a u64 from a *const u8
    This also looks sound to me. It's an unsafe fn, although pherhaps the requirement that the pointer points to 8 bytes of data could be specified more explicitly in the doc comment for it
    It correctly uses unaligned reads
  3. A test case for (2)
    Not much to say about this, looks fine
  4. A 121 line unsafe block in State::hash
    This has two parts:
    First it iterates through the buffer using end and current raw pointers and reading 4 u64's using the function from (2). It obtains the 32-byte ending chunk pointer by doing buf.as_ptr().offset(buf.len() as isize & !0x1F) which looks fine to me but I would like someone else to double check. The u64 reads seem fine since (2) uses unaligned reads
    After that it handles the remainder
    It calculates the remaining bytes by doing let mut excessive = buf.len() as usize + buf.as_ptr() as usize - end_ptr as usize;. I'm assuming this is fine but I think it would be more clear if it has parenthesis around the addition to make it a bit more clear that this never underflows
    Then it just hashes the remaining bytes using the functions from (1) and (2). As far as I can tell, all the match arms are sound.
  5. A 97 line unsafe block in SeaHasher::push_bytes
    I did not look into this. At a glance it seems to mostly consist of a copy_to_nonoverlapping call and then something similar to what's done in (4)

It does not have any dependencies, so there is nothing to audit with regards to deps

I also ran the tests under miri and they all passed

In summary, as far as I can tell the crate is sound (Outside of (5) which I did not look into), but can probably be improved a bit with comments and smaller unsafe blocks. Another thing we might wanna do is check if we can replace unsafe code with safe alternatives without performance regressions. The crate has criterion benchmarks which should make that easier to attempt

@Shnatsel
Copy link
Member

I'm pretty sure point 1 can be refactored using u32::from_ne_bytes(buf[..4].try_into().unwrap()) instead of unaligned reads. Since the length is known in advance, the optimizer should elide bounds checks and the unreachable panic in .unwrap().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants