fix: keep column statistics of all NULL column #16753
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
In PR #16728, statistics for columns with unsupported data types are excluded.
However, the data type is inferred from the the scalar, instead of the the type of column, thus, an edge case may arise for columns that contain only NULL values:
For such columns, both the min and max scalar values are NULL, causing them to be incorrectly classified as "supported_stat_type" and subsequently excluded. This leads to issues during pruning because
RangePruner
cannot prune these columns without available statistics.Although the table data is safe, and the correctness of filtering also retained, the execution of pruning may be inefficient:
Example
please disable table meta cache in query config file to reproduce this issue:
Changes
In this PR, column statistics with both NULL min and max values are retained. since they will be stored as
Scalar::Null
, they cloud be ser/deserialized without issue.Note:
databend_storages_common_table_meta::meta::supported_stat_type
may also work, but to minimize the risks (since other components also rely on it), changes are kept inColStatsVisitor
For tables have been processed(compact/insert, etc.) with PR #16728, it is safe to apply the changes of this PR:
Tests
Type of change
This change is