Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http.headers module needs a better hash function #4

Open
daurnimator opened this issue Jun 27, 2019 · 7 comments
Open

http.headers module needs a better hash function #4

daurnimator opened this issue Jun 27, 2019 · 7 comments

Comments

@daurnimator
Copy link
Contributor

The http.headers type HeaderIndex currently uses AutoHashMap, it should use a better hash function, possibly parametized.

Originally posted by @daurnimator in #2263

@dropwhile
Copy link

dropwhile commented Jun 29, 2019

Several other languages use siphash-2-4 (or siphash-1-3), with randomized key on startup, as an attempt to avoid denial-of-service attacks (which is important for http servers).

refs:

@daurnimator
Copy link
Contributor Author

I'd be happy with that. Zig even already has a SipHash implementation, so it should just be a matter of using it in the http.headers module.
@cactus do you want to work on this?

@dropwhile
Copy link

@daurnimator Looking into it a bit, SipHash returns a u64, but std.HashMap expects a hash function to return a u32.

Additionally, Headers.init currently does not allow failure. I think that would need to change, as siphash needs a hash key (Siphash.hash), which would likely best be generated from crypto.randomBytes (which is an alias for std.os.getrandom) or maybe one of the PRNGs (prngs need a seed, so randombytes would need to be called anyway?).

Since Siphash.hash requires a hash key along with input, this seems to make its use in std.HashMap a bit awkward, as std.HashMap wants comptime hash: fn (key: K) u32 but SipHash.has provides pub fn hash(key: []const u8, input: []const u8) T. Not exactly sure how to make the two mesh cleanly... I have an idea but it seems a bit awkward -- basically a wrapper around SipHash that converts it to the required api signature.

Finally, std.HashMap returns a type, which makes it a bit confusing for me to figure out how to reference it's return type as a param for the headers struct. Would you just specify type as the type?
I was reading the docs and it said that a fn that retuns a struct (anonymous struct), the struct will be named the same as the function. In that case, would something like this be recommended?

// this would return something with the right /signature/, even though we wouldn't use it...
const HeaderIndex = std.AutoHashMap([]const u8, HeaderIndexList);

pub const Headers = struct {
    // the owned header field name is stored in the index as part of the key
    allocator: *Allocator,
    data: HeaderList,
    index: HeaderIndex,  // <-- using the signature
    // this is a hypothetical wrapper around siphash to deal with the 3rd issue noted above
    hasher: HeaderHasher,

    const Self = @This();

    pub fn init(allocator: *Allocator) !Self {
        const hasher = try HeaderHasher.init(allocator);
        // same signature as AutoHashMap would return? 
        const index_hashmap = std.HashMap(
            []const u8,
            HeaderIndexList, 
            hasher.hash, 
            std.getAutoEqlFn([]const u8),
        );

        return Self{
            .allocator = allocator,
            .data = HeaderList.init(allocator),
            .index = index_hashmap.init(allocator),
            .hasher = hasher,
        };
    }

Any thoughts?

@daurnimator
Copy link
Contributor Author

SipHash returns a u64, but std.HashMap expects a hash function to return a u32.

Just @truncate the result.

Additionally, Headers.init currently does not allow failure. I think that would need to change, as siphash needs a hash key (Siphash.hash), which would likely best be generated from crypto.randomBytes (which is an alias for std.os.getrandom) or maybe one of the PRNGs (prngs need a seed, so randombytes would need to be called anyway?).

Have Headers.init take a seed argument and then just pass it along to siphash.
You could add a second constructor Headers.initWithRandom that can fail and calls std.os.getrandom

Finally, std.HashMap returns a type, which makes it a bit confusing for me to figure out how to reference it's return type as a param for the headers struct. Would you just specify type as the type?

Have index_hashmap at the top level. You should be able to get the hasher.hash function via e.g. HeaderHasher.hash instead.


Also have a look at open PR ziglang/zig#2797

@dropwhile
Copy link

SipHash returns a u64, but std.HashMap expects a hash function to return a u32.

Just @truncate the result.

Ah. I guess that works.

Additionally, Headers.init currently does not allow failure. I think that would need to change, as siphash needs a hash key (Siphash.hash), which would likely best be generated from crypto.randomBytes (which is an alias for std.os.getrandom) or maybe one of the PRNGs (prngs need a seed, so randombytes would need to be called anyway?).

Have Headers.init take a seed argument and then just pass it along to siphash.
You could add a second constructor Headers.initWithRandom that can fail and calls std.os.getrandom

Ok.

Finally, std.HashMap returns a type, which makes it a bit confusing for me to figure out how to reference it's return type as a param for the headers struct. Would you just specify type as the type?

Have index_hashmap at the top level. You should be able to get the hasher.hash function via e.g. HeaderHasher.hash instead.

Not sure that will work, as the index_hashmap has to exist first or the type signature will be wrong due to the .hash() method taking a self argument in my case?

Indeed, I get this:

error: expected 2 arguments, found 3

I'll look at it further though.

Also have a look at open PR ziglang/zig#2797

I'm not very familiar with Wyhash and how well it avoids the DoS issues that SipHash attempts to avoid. Looks like the author there ran into the same issue with the seed, and just hardcoded for now.
I'll definitely keep an eye on that PR though, thanks for pointing it out.

@Sahnvour
Copy link

I'm not very familiar with Wyhash and how well it avoids the DoS issues that SipHash attempts to avoid. Looks like the author there ran into the same issue with the seed, and just hardcoded for now.

Wyhash source says nothing about DoS so I would not assume it is secure in this regard. My reasonning in this PR was that the usecase hashmaps have to be optimized for, ie. the most common, is small keys. And that applications needing specific hash functions will anyway use their own. In this case, we don't really care about the seed being hardcoded, as it's a best effort without strong guaranties.

However it can be debated wether we would instead want the default hash be a best effort for security.

@peteroupc
Copy link

peteroupc commented Feb 10, 2020

With respect to hash tables, hash functions, and security, I am aware of Reini Urban's (@rurban) comments at the rurban/smhasher repository, which may be relevant.

@andrewrk andrewrk transferred this issue from ziglang/zig Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants