Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add deterministic block level sampling for small datasets #16670

Merged
merged 2 commits into from
Oct 24, 2024

Conversation

xudong963
Copy link
Member

@xudong963 xudong963 commented Oct 23, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

The PR contains two things:

  1. During testing, I found block level table sample always generates unstable and error-large results, to address the issue, this PR adds deterministic block level sampling for small datasets(parts <= 100).
  2. Add block level sample(50%) for filter selectivity to reduce io cost.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Oct 23, 2024
@xudong963 xudong963 added this pull request to the merge queue Oct 24, 2024
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Oct 24, 2024
@BohuTANG BohuTANG merged commit d1ad6f4 into databendlabs:main Oct 24, 2024
72 checks passed
@xudong963 xudong963 deleted the d_sample branch October 24, 2024 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants