-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-115704: Improve DJBX33A hash algorithm #115705
base: main
Are you sure you want to change the base?
Conversation
The following commit authors need to sign the Contributor License Agreement: |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
1 similar comment
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
77a701a
to
d951214
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
1 similar comment
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like you to address Serhiy's concern on the issue as well. And could we have the actual assembly code being generated actually (and it's comparison)?
Python/pyhash.c
Outdated
hash = hash * 33 * 33 * 33 * 33 + | ||
p[0] * 33 * 33 * 33 + | ||
p[1] * 33 * 33 + | ||
p[2] * 33 + | ||
p[3]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can directly compute those values (namely the powers of 33), although the compiler should be able to optimize it as well. You could however add a small comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review. Done.
Python/pyhash.c
Outdated
if (len >= 2) { | ||
if (len > 2) { | ||
hash = hash * 33 * 33 * 33 + | ||
p[0] * 33 * 33 + | ||
p[1] * 33 + | ||
p[2]; | ||
} | ||
else { | ||
hash = hash * 33 * 33 + p[0] * 33 + p[1]; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the compiler behaves differently if you use
if (len > 2) { ... }
else if (len == 2) { ... }
else if (len) { ... }
instead?
this PR is quite fundamentally different from the Lemire post/the PHP change. In PHP, the change was done in the implementation of the hash function for arbitrary lengths. In this PR, only the code for computing the hash of bytes with This might still be a worthwhile change, but there should be really strong benchmarking results that show this. |
I support it only to clean up a little by getting rid of the ugly (IMO) C fall-through code and replacing it with a simple loop. I don't think the compiler will generate anything significantly different. |
Accelerating python hash algorithm by "unoptimizing" it when using DJBX33A as hash algorithm. See Daniel Lemire's blog post: https://lemire.me/blog/2016/07/21/accelerating-php-hashing-by-unoptimizing-it/ Signed-off-by: PeterYang12 <[email protected]>
bf9cfa8
to
0364811
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Accelerating python hash algorithm by "unoptimizing" it when using DJBX33A as hash algorithm.
See Daniel Lemire's blog post:
https://lemire.me/blog/2016/07/21/accelerating-php-hashing-by-unoptimizing-it/
This idea has already been implemented in the PHP interpreter.