-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parquet reader very-long string performance #17773
Improve parquet reader very-long string performance #17773
Conversation
Could you post the impact of the change on the benchmarks? Not required to merge IMO, but it's nice to keep such result available long-term. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I really like the simplification in calc_threads_per_string_log2
.
Done |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
This reverts commit f1542d3.
@pmattione-nvidia I cancelled the most recent workflow to free up resources to unblock all of cudf CI for this PR: #17771 I'll rerun once #17771 is merged. |
/merge |
The previous strings PR significantly reduced the parquet reader string performance for very-long strings, for lengths ~1024 and longer. This PR fixes the performance issue by instituting a max memcpy length of 8 bytes at once (this length yielded best perf). Also, up to all of the threads in the block can work on the same string, rather than limiting it to just all of the threads in a warp.
PERFORMANCE:
Short strings: Unchanged
Length 1024: 25% faster
Longer lengths (up to 64k): Up to 90% faster, same as before strings PR
Checklist