Add batched streaming aggregations #324

fpetkovski · 2023-11-06T10:49:19Z

With the current model we expect each Next call to return samples for unique steps. This approach works well because of its simplicity, but for high cardinality queries (100K+ series), it tends to use a lot of memory because the buffers for each step tend to be big.

This commit resolves that by allowing the aggregate to handle batches from the same step coming from subsequent Next calls. Selectors are expanded with a batchSize parameter which can be injected when a streaming aggregate is present in the plan. Using this parameter then can put an upper limit on the size of the output vectors they produce.

This is a before and after of the total heap size of all queriers from a 1M series query. The green line indicates the total heap size (sum(go_memstats_heap_inuse_bytes)) for all queriers in the query path when executing the query with this change. The yellow line is the total memory used by all queriers in the query path using the main branch of the engine.

There is approximately a 20% reduction in heap size because vector batches from the vector selector are capped to 32K instead of being unbounded as they are on main.

With the current model we expect each Next call to return samples for unique steps. This approach works well because of its simplicity, but for high cardinality queries (100K+ series), it tends to use a lot of memory because the buffers for each step tend to be big. This commit resolves that by allowing the aggregate to handle batches from the same step coming from subsequent Next calls. Selectors are expanded with a batchSize parameter which can be injected when a streaming aggregate is present in the plan. Using this parameter then can put an upper limit on the size of the output vectors they produce. Signed-off-by: Filip Petkovski <[email protected]>

Signed-off-by: Filip Petkovski <[email protected]>

yeya24 · 2023-11-06T16:47:34Z

@fpetkovski Can you please add more description to the graph? What does the yellow and green line stand for?

fpetkovski · 2023-11-06T17:15:06Z

Hey @yeya24 I've added a brief description, let me know if that helps. I can't really post the legend because it has some internal details.

MichaHoffmann

lgtm

yeya24

LGTM

fpetkovski added 5 commits November 6, 2023 11:30

Remove unused method

452e288

Signed-off-by: Filip Petkovski <[email protected]>

Fix traverse

01d1f52

Signed-off-by: Filip Petkovski <[email protected]>

Fix acceptance tests

39345f5

Signed-off-by: Filip Petkovski <[email protected]>

Add group test and fix failure

1216a86

Signed-off-by: Filip Petkovski <[email protected]>

fpetkovski force-pushed the batched-streaming branch from f978570 to 1216a86 Compare November 6, 2023 13:26

MichaHoffmann approved these changes Nov 6, 2023

View reviewed changes

yeya24 approved these changes Nov 19, 2023

View reviewed changes

yeya24 merged commit 998354b into thanos-io:main Nov 19, 2023
6 checks passed

yeya24 mentioned this pull request Nov 19, 2023

Fuzzy query test failed on latest main #326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batched streaming aggregations #324

Add batched streaming aggregations #324

fpetkovski commented Nov 6, 2023 •

edited

Loading

yeya24 commented Nov 6, 2023

fpetkovski commented Nov 6, 2023

MichaHoffmann left a comment

yeya24 left a comment

Add batched streaming aggregations #324

Add batched streaming aggregations #324

Conversation

fpetkovski commented Nov 6, 2023 • edited Loading

yeya24 commented Nov 6, 2023

fpetkovski commented Nov 6, 2023

MichaHoffmann left a comment

Choose a reason for hiding this comment

yeya24 left a comment

Choose a reason for hiding this comment

fpetkovski commented Nov 6, 2023 •

edited

Loading