Tips for slow compact on a large bucket with large blocks #4434
-
Hello, I use Thanos with a rather large bucket (Ceph object store) - 10TB total. I store metrics at the raw resolution with 30 days retention, downsampling disabled. Here's my systemd daemon for it:
The daemon runs in continual mode, but recently it has been slow to complete its compaction runs and it doesn't reach the "delete blocks" part of the run until it's too late and the storage bucket is overflowing (related issue: #2605) I recently ran a compaction without the I have a locally compiled binary of Thanos which only runs block deletions on blocks marked for deletion, described here: #2605 (comment) One thing I can do is:
What I'm looking for is perhaps tips or solutions on how I can make this better.
Here are the metrics from the compact instance: Any tips would be appreciated, thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
The correct link is now: https://thanos.io/tip/thanos/sharding.md/#compactor There are various features about improving compactor performance. This is an umbrella issue which tracks them: #4233 Your first issue link is/should be resolved with #3115 That said, I'm not sure if you are really hitting certain limits that could not be already resolved by tweaking your setup. So I'm curious about what Thanos version you are using and if you could tell me something about the stats of it (i.e. cpu & memory usage). Did you also limit those stats? If you want to run multiple compactors, you could look into the labels. As per docs: "This allows to assign multiple streams to each instance of compactor." For example for store component one could use a relabel like this (don't use this for compactor!):
Yet this should not be used for compactor as this is not 'pinned' towards a specific stream. We merely split all data over multiple shards. So, you want some form of relabel config that regex on streams; i.e.;
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config |
Beta Was this translation helpful? Give feedback.
-
I am considering the same thing and I think you already gave the answer @sevagh 😄.
|
Beta Was this translation helpful? Give feedback.
-
Thanos for the replies. @yeya24 I'm reading this PR you recently got merged, and I think it might help me: https://github.com/thanos-io/thanos/pull/4239/files/15acd8c8683c8ecc785ec71e4c16f89738e839b6#diff-59764a4da653d4464eac20465390033ab8abbd8b54688979727065cb389e848d One of my issues with Ceph-Thanos is that I have 2x Prometheus pollers like a typical HA setup, and store 2x copies of each tsdb block (slightly different due to natural differences between two pollers). It looks like the offline deduplication you added with "penalty" mode intended for HA Prometheus would shrink my ceph bucket by 50%ish? By combining these 2x HA blocks? |
Beta Was this translation helpful? Give feedback.
-
That |
Beta Was this translation helpful? Give feedback.
-
@sevagh Let me move this to discussion as it is generally a question, not an issue. |
Beta Was this translation helpful? Give feedback.
-
So, I finally had the chance to upgrade Thanos from 0.16.0-rc0 (I installed this almost a year ago, I think) to 0.22.0-rc0. Huge benefits! I'm pleased.
All of my Thanos components use the |
Beta Was this translation helpful? Give feedback.
The correct link is now: https://thanos.io/tip/thanos/sharding.md/#compactor
About scalability: https://thanos.io/tip/components/compact.md/#scalability
There are various features about improving compactor performance. This is an umbrella issue which tracks them: #4233
Your first issue link is/should be resolved with #3115
That said, I'm not sure if you are really hitting certain limits that could not be already resolved by tweaking your setup. So I'm curious about what Thanos version you are using and if you could tell me something about the stats of it (i.e. cpu & memory usage). Did you also limit those stats?
Could you also give us some number of the amount of series per 2 hour block? …