Replies: 9 comments
-
Also noticing that most of my compactors are spending time in They keep on switching between this deletion process and compaction. |
Beta Was this translation helpful? Give feedback.
-
Compaction might not be always happening. It wait for you have enough data to be compacted. Even if you specify 16 concurrency to compact blocks, it doesn't mean you always have 16 compaction jobs to run. You probably only have 1 then only 1 core is used. We don't support using more than 1 core within a single compaction job because it is now single-threaded. Before it actually compacts blocks, compactor might spend quite long time downloading required blocks and analyzing the index file so you might see CPU usage still low here as it is IO intensive. |
Beta Was this translation helpful? Give feedback.
-
Got it, that clears my confusion. What's the definition of compaction job here? And in what scenario does more concurrency come into effect? |
Beta Was this translation helpful? Give feedback.
-
A single compaction which produces 1 output block.
I can image if you have multiple clusters with different cluster labels, then multiple compaction jobs will be available since each cluster (they should have their own ext labels) will have its own compaction job at the same time. Within a single compaction group, there might be multiple compaction jobs available (imaging you have a huge compaction backlog), but we only support 1 concurrency per group. This is a limitation in Thanos right now. |
Beta Was this translation helpful? Give feedback.
-
I see. Is this configurable or does it only recognize the
Ah that makes sense. Thanks for clarifying. |
Beta Was this translation helpful? Give feedback.
-
It is just different ext labels you configured. Doesn't have to be |
Beta Was this translation helpful? Give feedback.
-
I will convert this into a discussion. |
Beta Was this translation helpful? Give feedback.
-
Got it. Thanks a lot for clarifying this. As a followup, how exactly does compactor proceed to compact blocks further in a given compaction group? What I tried initially was to shard based on time period. Is that also limited by singular concurrency? Because what I noticed that, the compactor which was going through the latest blocks, was able to compact the fresh 2h blocks into 8h blocks. But all the other compactors which were processing an older time period (a period where all blocks have been compacted to 8h durations) were idle all the time. Does it wait for the whole stream to be converted to 8h blocks before compacting further? |
Beta Was this translation helpful? Give feedback.
-
Found this issue (#3806) and the corresponding PR (#3807) for concurrency in a given group. What's the maintainers' stance on this? Not seeing any activity over there. |
Beta Was this translation helpful? Give feedback.
-
I'm trying to compact and downsample through a long backlog. We weren't reaching past 8h blocks and hence, never retaining blocks of
5m
and1h
resolution.As per what I read in the documentation (https://thanos.io/tip/operating/compactor-backlog.md/#scale-the-compactor), I did the following to speed up the process,
prometheus
label.I observed that, as I increase CPU cores assigned and the values of
--compact.concurrency
,--downsample.concurrency
, I wasn't seeing a corresponding increase in CPU usage% and even other metrics like no. of goroutines.I had the impression that if I set a concurrency value equal to the no. of cores assigned, I should see a very high % of CPU used but instead, even with ~16 cores assigned and
--compact.concurrency=16
, % CPU used was still hovering around 10%.I did notice that there was a visible spike in CPU and memory usage when the compactor was creating blocks of
1h
resolution which is expected.But I guess my question here is that how can I get full CPU saturation throughout the compaction and downsampling cycle; not just towards the end of it.
TIA. Let me know if any data is needed from my end. I'm on version
0.32.5
.Beta Was this translation helpful? Give feedback.
All reactions