-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize comparison operators #269
base: master
Are you sure you want to change the base?
Conversation
@chfast, let me know what you find out please :-) |
Preliminary benchmarks:
So this looks solid for compare operators directly and seems to help the shifts as well, although this may be code layout effect. |
Thanks, glad to hear :-) |
Confirmed, this does not affect assembly of shift operators. Proof: https://godbolt.org/z/4b6cT4rje. |
Yes, makes sense. When I played with this repo a while back, I noticed that the benchmarks had variability from one run to the next, so I'm not sure how reliable the comparison truly is. |
Codecov Report
@@ Coverage Diff @@
## master #269 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 10 10
Lines 1941 1937 -4
=========================================
- Hits 1941 1937 -4
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@chfast should this be merged? |
On the latest compilers I don't see much difference so I'm not very much in favor because this implementation has 4 branches while the original one is branchless. But people are welcome to drop their own benchmarking results (use "compare" filter). |
Why does it matter that it is branchless if it is slower? |
I will try to clean up the situation.
Here you can see the assembly output for each: https://godbolt.org/z/WY3qqzrET AnalysisIn the worst case every variant need to load all 8 words. Additionally, "sub" executes 4 subtract instructions (independent of input). The "ne" executes 4 BenchmarksI added all variants of implementations in #277 and benchmarked them on two CPUs. There are 5 benchmark cases:
Haswell
Skylake
Zen3This is noisy machine with AMD CPU. For this case I used
Xeon Platinum 8272CL
From the results we should rather consider switching back to "sub" implementation. I may do some other benchmarks if I have time. |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
No description provided.