-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strpos and contains performance improvements #14211
Conversation
I would really appreciate it if reviewers could verify the performance improvements on any hardware they have available - mac, deployment servers, etc using the cargo benchmarks listed in #14210 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Omega359 -- this is really cool to see this level of attention to performance
I think @samuelcolvin (and maybe @adriangb ) did some pretty hard core optimization over in arrow-rs recently on
@@ -286,11 +286,17 @@ jobs: | |||
uses: ./.github/actions/setup-builder | |||
with: | |||
rust-version: stable | |||
- name: Install emscripten dependency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this needed for?
rand = { workspace = true } | ||
regex = { workspace = true, optional = true } | ||
sha2 = { version = "^0.10.1", optional = true } | ||
stringzilla = { version = "3", optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest concern is these new library dependencies as our dependency. I haven't looked at it in detail but I think we need to evaluate how likely it is for it to be maintained, etc
https://github.com/ashvardanian/stringzilla looks pretty solild though after a quick look
Also, I think we should be planning to upstream as many of the low level operations (like string contains) as possible to arrow-rs (so we can have a larger user base to help improve them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think @samuelcolvin 's experience was that the different implemnetations were faster/slower on some architectures / cahce sizes. So unless we have significant evidence this is faster for most/all platforms I would be hesitant to include it
@@ -0,0 +1,162 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to break this benchmark into its own PR?
This is mostly a selfish ask as I have comparison scripts for performance that compare performance against main, and they only work if the benchmark is on main as well
Which issue does this PR close?
Rationale for this change
Perf.
What changes are included in this PR?
Code.
Are these changes tested?
Yes, via existing tests.
Are there any user-facing changes?
No.