Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Select for rewriting the files belonging to old partitioning schemes #12083

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

adrians
Copy link

@adrians adrians commented Jan 24, 2025

After doing a partition-evolution attempt, the implicit rewrite_data_files procedure will not pick up files belonging to the old partitioning schema.

The only way to rewrite those files is to use the rewrite-all parameter (and have to rewrite even files that are correctly-sized and correctly-placed - this becomes problematic for tables larger with more than 1TB in size) or through a series of Copy-On-Write updates (which doesn't fit the environment I work on, since that table will not be used for writing in a while).

In this fix, I've modified the implicit criteria for selection in the rewrite_data_files procedure in order to also target the files of the old partitioning scheme.

@github-actions github-actions bot added the core label Jan 24, 2025
@adrians adrians force-pushed the main-rewrite-unpartitioned branch from 486cc1f to d5a461c Compare January 24, 2025 08:53
@adrians adrians changed the title Spark 3.5: Select for rewriting the files belonging to old partitioning schemes Core: Select for rewriting the files belonging to old partitioning schemes Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant