Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support sampling table with block and row level simultaneously #16613

Merged
merged 3 commits into from
Oct 16, 2024

Conversation

xudong963
Copy link
Member

@xudong963 xudong963 commented Oct 15, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Oct 15, 2024
Copy link

what-the-diff bot commented Oct 15, 2024

PR Summary

  • Revised Sampling Structure
    The older structure in code, named 'Sample' and 'SampleLevel' which were used to handle data sampling in our system, have been replaced with a modern, comprehensive structure called 'SampleConfig'. This not only simplifies data handling but also increases efficiency.

  • New SampleRowLevel Enum
    A fresh Enum class, 'SampleRowLevel', has been added. Enums or Enumerated Types are data types that allow a variable to be one of a few predefined types/variants. This helps to understand and manage the different sampling methods we use, like number of rows and probability.

  • Improved Implementation across Codebase
    Changes have been made across the software to import and make use of the new 'SampleConfig' structure. This includes updates to scanning and querying data, parsing and formatting methods, etc. This will ensure consistency in data sampling across our system.

  • Updated Display and Error Handling
    The system's display feature is now enhanced to format the new sample configurations properly. In addition, better error handling and sampling logic implementations are in place to take into account only row-level sampling.

  • Updated Parser and Binder Modules
    Several changes were made within the parser and binder modules of the program to provide support for the new sampling structure. These modules aid in processing the data and binding them to certain conditions, respectively.

  • Enhanced Random Sampling Logic
    The random sampling logic is now improved to incorporate the probability settings for block-level sampling provided by the new 'SampleConfig' structure. This will improve the reliability of our data sampling and results.

@xudong963 xudong963 marked this pull request as draft October 15, 2024 14:53
@xudong963 xudong963 marked this pull request as ready for review October 16, 2024 03:28
@Dousir9 Dousir9 added this pull request to the merge queue Oct 16, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 16, 2024
@xudong963 xudong963 added this pull request to the merge queue Oct 16, 2024
@BohuTANG BohuTANG removed this pull request from the merge queue due to a manual request Oct 16, 2024
@BohuTANG BohuTANG merged commit 08e98f5 into databendlabs:main Oct 16, 2024
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants