Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract actions from sidecar files referenced in checkpoint batches #670

Open
sebastiantia opened this issue Jan 31, 2025 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@sebastiantia
Copy link
Collaborator

Please describe why this is necessary.

These changes are part of V2 checkpoint read support.
For a scan, we need to build the list of add and remove actions required to make up the table’s state. These changes are required to read the necessary actions in sidecar files referenced by V2 checkpoints.

Describe the functionality you are proposing.

To create the actions iterator, we chain together:

  1. An iterator of actions from commit files
  2. An iterator of actions from a checkpoint file

For every batch of EngineData from a checkpoint file:

Visit the rows of each checkpoint batch with the new SidecarVisitor. This visitor collects all sidecar file paths found in sidecar actions within a checkpoint batch.

  • If sidecar file paths exist
    Read the corresponding sidecar files, generating an iterator over batches of actions in the sidecar files.
    Replace the originating checkpoint batch with the sidecar batches that contain the add actions which make up the table’s state.

  • If no sidecar file paths exist
    Leave the checkpoint batch as-is in the checkpoint batches iterator as it already contains the add actions which make up the table’s state.
    Note: A batch may not include add actions, but other actions (like txn, metadata, protocol). This is safe as the non-file actions will be ignored.

Additional context

No response

@sebastiantia sebastiantia added the enhancement New feature or request label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant