compactor: Maximizing CPU usage #7060

PrayagS · 2024-01-10T12:12:27Z

PrayagS
Jan 10, 2024

I'm trying to compact and downsample through a long backlog. We weren't reaching past 8h blocks and hence, never retaining blocks of 5m and 1h resolution.

As per what I read in the documentation (https://thanos.io/tip/operating/compactor-backlog.md/#scale-the-compactor), I did the following to speed up the process,

Shard using selectors. We have two deployments of Prometheus and hence, sharded using the prometheus label.
One of the Prometheus deployments needed us to shard based on time periods. I deployed more compactors which only processed a week worth of blocks.

I observed that, as I increase CPU cores assigned and the values of --compact.concurrency, --downsample.concurrency, I wasn't seeing a corresponding increase in CPU usage% and even other metrics like no. of goroutines.

I had the impression that if I set a concurrency value equal to the no. of cores assigned, I should see a very high % of CPU used but instead, even with ~16 cores assigned and --compact.concurrency=16, % CPU used was still hovering around 10%.

I did notice that there was a visible spike in CPU and memory usage when the compactor was creating blocks of 1h resolution which is expected.

But I guess my question here is that how can I get full CPU saturation throughout the compaction and downsampling cycle; not just towards the end of it.

TIA. Let me know if any data is needed from my end. I'm on version 0.32.5.

PrayagS · 2024-01-10T13:06:53Z

PrayagS
Jan 10, 2024
Author

Also noticing that most of my compactors are spending time in cleaning of aborted partial uploads, and cleaning of blocks marked for deletion. The latter makes sense but not sure of the former.

They keep on switching between this deletion process and compaction.

0 replies

yeya24 · 2024-01-13T15:32:28Z

yeya24
Jan 13, 2024
Maintainer

I had the impression that if I set a concurrency value equal to the no. of cores assigned, I should see a very high % of CPU used but instead, even with ~16 cores assigned and --compact.concurrency=16, % CPU used was still hovering around 10%.

Compaction might not be always happening. It wait for you have enough data to be compacted. Even if you specify 16 concurrency to compact blocks, it doesn't mean you always have 16 compaction jobs to run. You probably only have 1 then only 1 core is used. We don't support using more than 1 core within a single compaction job because it is now single-threaded.
The CPU usage pattern is like mostly idle -> CPU high during compaction time -> idle.

Before it actually compacts blocks, compactor might spend quite long time downloading required blocks and analyzing the index file so you might see CPU usage still low here as it is IO intensive.

0 replies

PrayagS · 2024-01-13T16:21:59Z

PrayagS
Jan 13, 2024
Author

You probably only have 1 then only 1 core is used. We don't support using more than 1 core within a single compaction job because it is now single-threaded.

Got it, that clears my confusion.

What's the definition of compaction job here? And in what scenario does more concurrency come into effect?

0 replies

yeya24 · 2024-01-13T17:27:52Z

yeya24
Jan 13, 2024
Maintainer

What's the definition of compaction job here?

A single compaction which produces 1 output block.

And in what scenario does more concurrency come into effect?

I can image if you have multiple clusters with different cluster labels, then multiple compaction jobs will be available since each cluster (they should have their own ext labels) will have its own compaction job at the same time.

Within a single compaction group, there might be multiple compaction jobs available (imaging you have a huge compaction backlog), but we only support 1 concurrency per group. This is a limitation in Thanos right now.

0 replies

PrayagS · 2024-01-13T17:40:02Z

PrayagS
Jan 13, 2024
Author

I can image if you have multiple clusters with different cluster labels, then multiple compaction jobs will be available since each cluster (they should have their own ext labels) will have its own compaction job at the same time.

I see. Is this configurable or does it only recognize the cluster label? Can it be configured to run different jobs for set of blocks differentiated via the prometheus label (cluster value is the same)?

Within a single compaction group, there might be multiple compaction jobs available (imaging you have a huge compaction backlog), but we only support 1 concurrency per group. This is a limitation in Thanos right now.

Ah that makes sense. Thanks for clarifying.

0 replies

yeya24 · 2024-01-13T17:54:54Z

yeya24
Jan 13, 2024
Maintainer

I see. Is this configurable or does it only recognize the cluster label? Can it be configured to run different jobs for set of blocks differentiated via the prometheus label (cluster value is the same)?

It is just different ext labels you configured. Doesn't have to be cluster label.

0 replies

yeya24 · 2024-01-14T03:06:02Z

yeya24
Jan 14, 2024
Maintainer

I will convert this into a discussion.

0 replies

PrayagS · 2024-01-15T02:03:44Z

PrayagS
Jan 15, 2024
Author

It is just different ext labels you configured. Doesn't have to be cluster label.

Got it. Thanks a lot for clarifying this.

As a followup, how exactly does compactor proceed to compact blocks further in a given compaction group?

What I tried initially was to shard based on time period. Is that also limited by singular concurrency? Because what I noticed that, the compactor which was going through the latest blocks, was able to compact the fresh 2h blocks into 8h blocks. But all the other compactors which were processing an older time period (a period where all blocks have been compacted to 8h durations) were idle all the time.

Does it wait for the whole stream to be converted to 8h blocks before compacting further?

0 replies

PrayagS · 2024-01-15T11:40:43Z

PrayagS
Jan 15, 2024
Author

Found this issue (#3806) and the corresponding PR (#3807) for concurrency in a given group.

What's the maintainers' stance on this? Not seeing any activity over there.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compactor: Maximizing CPU usage #7060

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

compactor: Maximizing CPU usage #7060

PrayagS Jan 10, 2024

Replies: 9 comments

PrayagS Jan 10, 2024 Author

yeya24 Jan 13, 2024 Maintainer

PrayagS Jan 13, 2024 Author

yeya24 Jan 13, 2024 Maintainer

PrayagS Jan 13, 2024 Author

yeya24 Jan 13, 2024 Maintainer

yeya24 Jan 14, 2024 Maintainer

PrayagS Jan 15, 2024 Author

PrayagS Jan 15, 2024 Author

PrayagS
Jan 10, 2024

PrayagS
Jan 10, 2024
Author

yeya24
Jan 13, 2024
Maintainer

PrayagS
Jan 13, 2024
Author

yeya24
Jan 13, 2024
Maintainer

PrayagS
Jan 13, 2024
Author

yeya24
Jan 13, 2024
Maintainer

yeya24
Jan 14, 2024
Maintainer

PrayagS
Jan 15, 2024
Author

PrayagS
Jan 15, 2024
Author