Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential performance improvement in contains and strpos functions #14210

Open
Omega359 opened this issue Jan 20, 2025 · 1 comment · May be fixed by #14211
Open

Potential performance improvement in contains and strpos functions #14210

Omega359 opened this issue Jan 20, 2025 · 1 comment · May be fixed by #14211
Assignees
Labels
enhancement New feature or request

Comments

@Omega359
Copy link
Contributor

Is your feature request related to a problem or challenge?

I came across a library that accelerates some string operations (primarily find for DataFusion's use case) using simd. It seems to be better optimized for larger strings than the memchr::memmem::find that is being used by arrow. For smaller strings on my machine the memchr crate seems to perform better but for longer strings stringzilla seems to outperform it quite significantly. The code changes required to support it are minimal though I would like to need performance comparisons on other architectures (I ran the following on my i9-13900h laptop)

# cargo bench --bench contains --bench strpos -- --save-baseline main
# <switch to stringzilla branch>
# cargo bench --bench contains --bench strpos -- --save-baseline stringzilla
# critcmp main stringzilla
group                                          main                                    stringzilla
-----                                          ----                                    -----------
contains_StringArray_ascii_str_len_1024        32.47    16.1±0.95ms        ? ?/sec     1.00  496.2±114.11µs        ? ?/sec
contains_StringArray_ascii_str_len_128         16.00     3.4±0.14ms        ? ?/sec     1.00   214.3±71.34µs        ? ?/sec
contains_StringArray_ascii_str_len_32          1.00   185.0±24.42µs        ? ?/sec     1.01   187.2±20.89µs        ? ?/sec
contains_StringArray_ascii_str_len_4096        15.92    50.1±5.63ms        ? ?/sec     1.00      3.1±0.26ms        ? ?/sec
contains_StringArray_ascii_str_len_8           1.04     97.4±9.77µs        ? ?/sec     1.00     94.0±7.62µs        ? ?/sec
contains_StringArray_utf8_str_len_1024         26.04    24.8±2.85ms        ? ?/sec     1.00  951.0±112.52µs        ? ?/sec
contains_StringArray_utf8_str_len_128          14.07     5.1±0.44ms        ? ?/sec     1.00   362.2±85.63µs        ? ?/sec
contains_StringArray_utf8_str_len_32           1.50   649.9±72.47µs        ? ?/sec     1.00   432.1±63.06µs        ? ?/sec
contains_StringArray_utf8_str_len_4096         16.58    80.3±8.61ms        ? ?/sec     1.00      4.8±0.41ms        ? ?/sec
contains_StringArray_utf8_str_len_8            1.00   236.2±29.07µs        ? ?/sec     1.08   254.4±16.59µs        ? ?/sec
contains_StringViewArray_ascii_str_len_1024    31.25    16.1±0.85ms        ? ?/sec     1.00   515.8±99.46µs        ? ?/sec
contains_StringViewArray_ascii_str_len_128     17.33     3.5±0.36ms        ? ?/sec     1.00   204.8±20.87µs        ? ?/sec
contains_StringViewArray_ascii_str_len_32      1.06   192.6±27.58µs        ? ?/sec     1.00    182.2±9.15µs        ? ?/sec
contains_StringViewArray_ascii_str_len_4096    14.90    49.0±4.56ms        ? ?/sec     1.00      3.3±0.42ms        ? ?/sec
contains_StringViewArray_ascii_str_len_8       1.00     98.0±9.80µs        ? ?/sec     1.03    100.7±8.74µs        ? ?/sec
contains_StringViewArray_utf8_str_len_1024     22.66    24.2±2.08ms        ? ?/sec     1.00  1068.3±369.70µs        ? ?/sec
contains_StringViewArray_utf8_str_len_128      14.90     5.1±0.49ms        ? ?/sec     1.00   343.2±32.31µs        ? ?/sec
contains_StringViewArray_utf8_str_len_32       1.49   623.7±43.61µs        ? ?/sec     1.00   417.7±31.81µs        ? ?/sec
contains_StringViewArray_utf8_str_len_4096     15.93    79.7±8.05ms        ? ?/sec     1.00      5.0±0.63ms        ? ?/sec
contains_StringViewArray_utf8_str_len_8        1.00   246.2±52.73µs        ? ?/sec     1.13   277.5±31.61µs        ? ?/sec
strpos_StringArray_ascii_str_len_1024          9.96      9.8±1.15ms        ? ?/sec     1.00  986.1±116.33µs        ? ?/sec
strpos_StringArray_ascii_str_len_128           3.39  1335.0±145.51µs        ? ?/sec    1.00   393.3±40.23µs        ? ?/sec
strpos_StringArray_ascii_str_len_32            1.06   420.7±92.47µs        ? ?/sec     1.00   396.0±57.32µs        ? ?/sec
strpos_StringArray_ascii_str_len_4096          5.85     40.6±3.68ms        ? ?/sec     1.00      6.9±0.84ms        ? ?/sec
strpos_StringArray_ascii_str_len_8             1.07   150.4±12.31µs        ? ?/sec     1.00   141.0±10.02µs        ? ?/sec
strpos_StringArray_utf8_str_len_1024           20.27    26.7±2.58ms        ? ?/sec     1.00  1316.5±127.39µs        ? ?/sec
strpos_StringArray_utf8_str_len_128            6.71      3.9±0.27ms        ? ?/sec     1.00   586.3±88.39µs        ? ?/sec
strpos_StringArray_utf8_str_len_32             2.45  1653.1±158.90µs        ? ?/sec    1.00   673.5±62.95µs        ? ?/sec
strpos_StringArray_utf8_str_len_4096           14.90   101.7±4.95ms        ? ?/sec     1.00      6.8±0.79ms        ? ?/sec
strpos_StringArray_utf8_str_len_8              1.82   713.9±45.84µs        ? ?/sec     1.00   392.3±44.63µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_1024      7.73      9.0±1.26ms        ? ?/sec     1.00  1167.2±114.81µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_128       2.59  1240.3±130.47µs        ? ?/sec    1.00   478.0±48.19µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_32        1.00   395.4±44.10µs        ? ?/sec     1.16   457.0±47.80µs        ? ?/sec
strpos_StringViewArray_ascii_str_len_4096      4.42     36.2±2.29ms        ? ?/sec     1.00      8.2±1.16ms        ? ?/sec
strpos_StringViewArray_ascii_str_len_8         1.00   192.9±14.14µs        ? ?/sec     1.06   204.2±18.46µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_1024       17.67    27.0±2.46ms        ? ?/sec     1.00  1526.3±205.96µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_128        6.85      4.1±0.40ms        ? ?/sec     1.00   593.2±51.54µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_32         2.49  1697.1±319.77µs        ? ?/sec    1.00   680.8±50.25µs        ? ?/sec
strpos_StringViewArray_utf8_str_len_4096       13.95   103.3±7.00ms        ? ?/sec     1.00      7.4±0.91ms        ? ?/sec
strpos_StringViewArray_utf8_str_len_8          1.93  776.5±127.84µs        ? ?/sec     1.00   401.6±78.46µs        ? ?/sec

Describe the solution you'd like

Incorporate the stringzilla into df functions where appropriate.

Describe alternatives you've considered

Leave the code as is.

Additional context

No response

@Omega359 Omega359 added the enhancement New feature or request label Jan 20, 2025
@Omega359
Copy link
Contributor Author

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant