Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update batch processing to support compacted inputs #530

Merged
merged 2 commits into from
Oct 29, 2024

Conversation

frankmcsherry
Copy link
Member

@frankmcsherry frankmcsherry commented Oct 26, 2024

The count_total and threshold_total operators would work batch at a time, which doesn't work out when data are pre-compacted: a prefix of the batches may have lower and upper frontiers that are no longer valid cut points for the trace, and we cannot get a cursor to "just before" either of them.

The fix used in reduce.rs is to drain all input batches before processing any, so that when invoked on compacted data we draw all batches currently available, up through to the minting frontier of the arrangement operator. All drained input batches are processed as one batch, using the CursorList tool to bundle them and treat them as one. We only end up using the last batch upper to get a trace cursor, which .. is more likely to be valid than other batch uppers (it should be valid, as each worker is single threaded, and there should be no concurrency issues with the batches somehow not being available).

fixes #526

@frankmcsherry frankmcsherry merged commit 8b61715 into TimelyDataflow:master Oct 29, 2024
7 checks passed
@github-actions github-actions bot mentioned this pull request Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

count_total panic with advanced trace
1 participant