-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runbook: clarify MimirIngesterReachingSeriesLimit errors and retries #9410
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -41,7 +41,15 @@ If nothing obvious from the above, check for increased load: | |||||
|
||||||
### MimirIngesterReachingSeriesLimit | ||||||
|
||||||
This alert fires when the `max_series` per ingester instance limit is enabled and the actual number of in-memory series in an ingester is reaching the limit. Once the limit is reached, writes to the ingester will fail (5xx) for new series, while appending samples to existing ones will continue to succeed. | ||||||
This alert fires when the `max_series` per ingester instance limit is enabled and the actual number of in-memory series in an ingester is reaching the limit. | ||||||
The threshold is set at 80%, to give some chance to react before the limit is reached. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
Once the limit is reached, writes to the ingester will fail for new series. Appending samples to existing ones will continue to succeed. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we avoid using "writes" as a noun? Could we say "write requests" or something else, if it's more accurate. We could then also say "Appending samples to existing requests continue to succeed" for greater clarity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We avoid using "will" in the docs. |
||||||
|
||||||
Note that the error responses sent back to the sender are classed as "server error" (5xx), which should result in a retry by the sender. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. classed = classified? |
||||||
While this situation continues, these retries will stall the flow of data, and newer data will queue up on the sender. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
If the condition is cleared in a short time, service can be restored with no data loss. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
This is different to what happens when the `max_global_series_per_user` is exceeded, which is considered a "client error" (4xx) where excess data is discarded. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
In case of **emergency**: | ||||||
|
||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.