-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add batched streaming aggregations #324
Conversation
With the current model we expect each Next call to return samples for unique steps. This approach works well because of its simplicity, but for high cardinality queries (100K+ series), it tends to use a lot of memory because the buffers for each step tend to be big. This commit resolves that by allowing the aggregate to handle batches from the same step coming from subsequent Next calls. Selectors are expanded with a batchSize parameter which can be injected when a streaming aggregate is present in the plan. Using this parameter then can put an upper limit on the size of the output vectors they produce. Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
Signed-off-by: Filip Petkovski <[email protected]>
f978570
to
1216a86
Compare
@fpetkovski Can you please add more description to the graph? What does the yellow and green line stand for? |
Hey @yeya24 I've added a brief description, let me know if that helps. I can't really post the legend because it has some internal details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
With the current model we expect each Next call to return samples for unique steps. This approach works well because of its simplicity, but for high cardinality queries (100K+ series), it tends to use a lot of memory because the buffers for each step tend to be big.
This commit resolves that by allowing the aggregate to handle batches from the same step coming from subsequent Next calls. Selectors are expanded with a batchSize parameter which can be injected when a streaming aggregate is present in the plan. Using this parameter then can put an upper limit on the size of the output vectors they produce.
This is a before and after of the total heap size of all queriers from a 1M series query. The green line indicates the total heap size (
sum(go_memstats_heap_inuse_bytes)
) for all queriers in the query path when executing the query with this change. The yellow line is the total memory used by all queriers in the query path using the main branch of the engine.There is approximately a 20% reduction in heap size because vector batches from the vector selector are capped to 32K instead of being unbounded as they are on main.