Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parquet reader very-long string performance #17773

Merged
merged 16 commits into from
Jan 28, 2025

Conversation

pmattione-nvidia
Copy link
Contributor

@pmattione-nvidia pmattione-nvidia commented Jan 21, 2025

The previous strings PR significantly reduced the parquet reader string performance for very-long strings, for lengths ~1024 and longer. This PR fixes the performance issue by instituting a max memcpy length of 8 bytes at once (this length yielded best perf). Also, up to all of the threads in the block can work on the same string, rather than limiting it to just all of the threads in a warp.

PERFORMANCE:
Short strings: Unchanged
Length 1024: 25% faster
Longer lengths (up to 64k): Up to 90% faster, same as before strings PR

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@pmattione-nvidia pmattione-nvidia added Performance Performance related issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 21, 2025
@pmattione-nvidia pmattione-nvidia self-assigned this Jan 21, 2025
@pmattione-nvidia pmattione-nvidia requested a review from a team as a code owner January 21, 2025 19:17
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Jan 21, 2025
@vuule
Copy link
Contributor

vuule commented Jan 24, 2025

Could you post the impact of the change on the benchmarks? Not required to merge IMO, but it's nice to keep such result available long-term.

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I really like the simplification in calc_threads_per_string_log2.

cpp/src/io/parquet/page_string_utils.cuh Outdated Show resolved Hide resolved
cpp/src/io/parquet/page_string_utils.cuh Outdated Show resolved Hide resolved
@pmattione-nvidia
Copy link
Contributor Author

Could you post the impact of the change on the benchmarks? Not required to merge IMO, but it's nice to keep such result available long-term.

Done

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@galipremsagar
Copy link
Contributor

@pmattione-nvidia I cancelled the most recent workflow to free up resources to unblock all of cudf CI for this PR: #17771

I'll rerun once #17771 is merged.

@pmattione-nvidia
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit be1f76c into rapidsai:branch-25.02 Jan 28, 2025
107 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
Status: Landed
Development

Successfully merging this pull request may close these issues.

5 participants