You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
I came across a library that accelerates some string operations (primarily find for DataFusion's use case) using simd. It seems to be better optimized for larger strings than the memchr::memmem::find that is being used by arrow. For smaller strings on my machine the memchr crate seems to perform better but for longer strings stringzilla seems to outperform it quite significantly. The code changes required to support it are minimal though I would like to need performance comparisons on other architectures (I ran the following on my i9-13900h laptop)
Is your feature request related to a problem or challenge?
I came across a library that accelerates some string operations (primarily find for DataFusion's use case) using simd. It seems to be better optimized for larger strings than the memchr::memmem::find that is being used by arrow. For smaller strings on my machine the memchr crate seems to perform better but for longer strings stringzilla seems to outperform it quite significantly. The code changes required to support it are minimal though I would like to need performance comparisons on other architectures (I ran the following on my i9-13900h laptop)
Describe the solution you'd like
Incorporate the stringzilla into df functions where appropriate.
Describe alternatives you've considered
Leave the code as is.
Additional context
No response
The text was updated successfully, but these errors were encountered: