Update batch processing to support compacted inputs #530
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
count_total
andthreshold_total
operators would work batch at a time, which doesn't work out when data are pre-compacted: a prefix of the batches may havelower
andupper
frontiers that are no longer valid cut points for the trace, and we cannot get a cursor to "just before" either of them.The fix used in
reduce.rs
is to drain all input batches before processing any, so that when invoked on compacted data we draw all batches currently available, up through to the minting frontier of the arrangement operator. All drained input batches are processed as one batch, using theCursorList
tool to bundle them and treat them as one. We only end up using the last batchupper
to get a trace cursor, which .. is more likely to be valid than other batch uppers (it should be valid, as each worker is single threaded, and there should be no concurrency issues with the batches somehow not being available).fixes #526