Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add a summary metric for route reload durations
I'm super curious about how long these take, as we're loading around 1M routes from the database every time the routes reload (over 1M in draft, just under in live). Al reckons about 20s, based on the logs, but it would be good to know for sure. This adds a summary metric to which will allow us to calculate median / 90th / 95th / 99th percentile durations. I've also added labels to the count / duration metrics so we can tell which ones are successes and failures. If you don't put success / failure labels on your duration metrics they can get all mucked up by quick failures and slow successes, which you can't distinguish between. Prometheus summaries / histograms[0] are a bit hard to wrap one's head around, but I think summary is the right choice here. Key factors: 1) With Histograms, you have to specify the timings of the buckets you care about up front (and we don't know how long these reloads take, so that's hard) 2) Summaries let you specify which quantiles you want up front, with the calculation happening "on the client side" (i.e. inside router, before things go to prometheus), which is more expensive at observation time 3) We're not making many observations for this metric, because we only reload routes once every few seconds (max), so the cost of calculating the summary on the client side should be small. The Objectives map sets the quantiles we care about, and an amount of error. In this case, by setting `0.5: 0.01` I'm saying "bucket things so I get a quantile that's between 0.49 and 0.51", and by setting `0.99: 0.005` I'm saying "bucket things so I get a quantile that's between 0.985 and 0.995". They're not exact for performance reasons.[1] [0] - https://prometheus.io/docs/practices/histograms/ [1] - https://grafana.com/blog/2022/03/01/how-summary-metrics-work-in-prometheus/#limiting-the-error-an-upper-bound-for-delta
- Loading branch information