gh-115704: Improve DJBX33A hash algorithm #115705

PeterYang12 · 2024-02-20T09:24:03Z

Accelerating python hash algorithm by "unoptimizing" it when using DJBX33A as hash algorithm.
See Daniel Lemire's blog post:
https://lemire.me/blog/2016/07/21/accelerating-php-hashing-by-unoptimizing-it/
This idea has already been implemented in the PHP interpreter.

Accelerate DJBX33A hash algorithm by "unoptimizing" it #115704

cpython-cla-bot · 2024-02-20T09:24:06Z

The following commit authors need to sign the Contributor License Agreement:

[email protected]

Click the button to sign:

bedevere-app · 2024-02-20T09:24:10Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

bedevere-app · 2024-02-20T09:25:04Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

bedevere-app · 2024-02-27T08:09:06Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

bedevere-app · 2024-08-25T04:10:19Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

picnixz

I'd like you to address Serhiy's concern on the issue as well. And could we have the actual assembly code being generated actually (and it's comparison)?

picnixz · 2024-08-25T09:07:49Z

Python/pyhash.c

+            hash = hash * 33 * 33 * 33 * 33 +
+                   p[0] * 33 * 33 * 33 +
+                   p[1] * 33 * 33 +
+                   p[2] * 33 +
+                   p[3];


I think you can directly compute those values (namely the powers of 33), although the compiler should be able to optimize it as well. You could however add a small comment.

Thank you for your review. Done.

Python/pyhash.c

picnixz · 2024-08-25T09:13:00Z

Python/pyhash.c

+        if (len >= 2) {
+            if (len > 2) {
+                hash = hash * 33 * 33 * 33 +
+                       p[0] * 33 * 33 +
+                       p[1] * 33 +
+                       p[2];
+            }
+            else {
+                hash = hash * 33 * 33 + p[0] * 33 + p[1];
+            }
+        }


Does the compiler behaves differently if you use

if (len > 2) { ... } else if (len == 2) { ... } else if (len) { ... }

instead?

cfbolz · 2024-08-25T10:36:26Z

this PR is quite fundamentally different from the Lemire post/the PHP change. In PHP, the change was done in the implementation of the hash function for arbitrary lengths. In this PR, only the code for computing the hash of bytes with length <= 7 was changed. The hash function of arbitrary lengths is already using a direct multiplication in CPython, and is operating on several characters at a time.

This might still be a worthwhile change, but there should be really strong benchmarking results that show this.

jxu · 2024-08-26T21:29:08Z

I support it only to clean up a little by getting rid of the ugly (IMO) C fall-through code and replacing it with a simple loop. I don't think the compiler will generate anything significantly different.

Accelerating python hash algorithm by "unoptimizing" it when using DJBX33A as hash algorithm. See Daniel Lemire's blog post: https://lemire.me/blog/2016/07/21/accelerating-php-hashing-by-unoptimizing-it/ Signed-off-by: PeterYang12 <[email protected]>

bedevere-app · 2024-08-27T05:52:27Z

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

PeterYang12 requested review from gpshead and tiran as code owners February 20, 2024 09:24

bedevere-app bot added the awaiting review label Feb 20, 2024

bedevere-app bot mentioned this pull request Feb 20, 2024

Accelerate DJBX33A hash algorithm by "unoptimizing" it #115704

Open

gpshead self-assigned this Feb 22, 2024

PeterYang12 force-pushed the accelerate_DJBX33A branch from 77a701a to d951214 Compare February 27, 2024 08:09

gpshead removed their assignment Aug 25, 2024

picnixz reviewed Aug 25, 2024

View reviewed changes

PeterYang12 force-pushed the accelerate_DJBX33A branch from bf9cfa8 to 0364811 Compare August 27, 2024 05:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-115704: Improve DJBX33A hash algorithm #115705

gh-115704: Improve DJBX33A hash algorithm #115705

PeterYang12 commented Feb 20, 2024 •

edited

Loading

cpython-cla-bot bot commented Feb 20, 2024 •

edited

Loading

bedevere-app bot commented Feb 20, 2024

bedevere-app bot commented Feb 20, 2024

bedevere-app bot commented Feb 27, 2024

bedevere-app bot commented Aug 25, 2024

picnixz left a comment

picnixz Aug 25, 2024

PeterYang12 Aug 27, 2024

picnixz Aug 25, 2024

cfbolz commented Aug 25, 2024

jxu commented Aug 26, 2024

bedevere-app bot commented Aug 27, 2024

gh-115704: Improve DJBX33A hash algorithm #115705

Are you sure you want to change the base?

gh-115704: Improve DJBX33A hash algorithm #115705

Conversation

PeterYang12 commented Feb 20, 2024 • edited Loading

cpython-cla-bot bot commented Feb 20, 2024 • edited Loading

bedevere-app bot commented Feb 20, 2024

bedevere-app bot commented Feb 20, 2024

bedevere-app bot commented Feb 27, 2024

bedevere-app bot commented Aug 25, 2024

picnixz left a comment

Choose a reason for hiding this comment

picnixz Aug 25, 2024

Choose a reason for hiding this comment

PeterYang12 Aug 27, 2024

Choose a reason for hiding this comment

picnixz Aug 25, 2024

Choose a reason for hiding this comment

cfbolz commented Aug 25, 2024

jxu commented Aug 26, 2024

bedevere-app bot commented Aug 27, 2024

PeterYang12 commented Feb 20, 2024 •

edited

Loading

cpython-cla-bot bot commented Feb 20, 2024 •

edited

Loading