-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64 specific support for lj_new_str random hash optimization #21
base: v2.1-agentzh
Are you sure you want to change the base?
Conversation
src/lj_str.c
Outdated
@@ -162,8 +162,11 @@ MSize | |||
lj_str_indep_hash(GCstr *str) { | |||
return lj_str_original_hash(strdata(str), str->len); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: needs 2 blank lines here like before.
@@ -50,7 +50,7 @@ CCOPT= -O2 -fomit-frame-pointer | |||
CCOPT_x86= -march=i686 -msse -msse2 -mfpmath=sse | |||
CCOPT_x64= | |||
CCOPT_arm= | |||
CCOPT_arm64= | |||
CCOPT_arm64= -march=armv8-a+crc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this breaks compatibility with other arm64 variants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "+crc" option is needed to inform the compiler that crc instruction valid for this arch.
The proper fix might be to check if crc support is available in hardware at runtime .
Modified the code to take care of above.
v = (v << 8) | len; | ||
return __crc32cw(h, v); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, on X86-64, if the string being hashed having length under 4, original hash function seem to be working better.
Other than that, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my tests on ARM64 platform the micro benchmark did not show any noticeable differences for below 4 bytes.
@siddhesh Will you mind having a quick look at this one? ;) Many thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There isn't a lot of difference in algorithms between the x86_64 and arm64 versions, so it is probably a much better idea to move lj_str_hash_x64.h as lj_str_hash.h and then just add #defines to define the various crc32 macros and the additional platform feature check for arm64.
That will make sure that future maintenance of these functions is easy.
LJ_STR_HASH = lj_str_hash; | ||
} | ||
else { | ||
LJ_STR_HASH = lj_str_original_hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest: Initialize LJ_STR_HASH to lj_str_hash and only if HWCAP_CRC32 is not set, reset it to lj_str_original_hash over here.
@siddhesh Thanks for looking into this! The original author of this PR is no longer responding it seems. Will you please create a separate PR so that we can merge the final version quickly? Many thanks! |
This ports the random hash optimization done for lj_new_str() for x64 ( in 7923c63 ) to arm64.
Verified on Arm64 platform with the existing benchmark developed as part of the x64 commit above - shows similar improvements as seen for x64 with this benchmark.
However not sure about the effect on any real time workloads.