Releases: grafana/mimir
2.7.1
This release contains 177 PRs from 43 authors, including new contributors Bartosz Cisek, dggmsa, gmintoco, Ihor Urazov, James Ross, Jean-Philippe Quéméner, Jon Gutschon, l3ioo, lpugoy, Nicolás Pazos, Oscar, Reto Kupferschmid, ying-jeanne. Thank you!
Grafana Mimir version 2.7.1 release notes
Grafana Labs is excited to announce version 2.7.1 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Note: During the release process, version 2.7.0 was tagged too early, before completing the release checklist and production testing. Release 2.7.1 doesn't include any code changes since 2.7.0, but now has proper release notes, published documentation, and has been fully tested in our production environment.
Features and enhancements
- Store-gateway streaming enabled by default The new default value of
5000
for-blocks-storage.bucket-store.batch-series-size
enables store-gateway streaming in the default configuration. This means that series are loaded from object storage in batches rather than buffering them all in memory before returning to the querier. Enabling streaming can reduce memory utilization peaks in the store-gateway. - Store-gateway index header reader no longer uses mmap by default Along with streaming enabled in the store-gateway, this change contributes to more efficient memory usage. See the Important changes section for more details.
- Support for
keep_firing_for
option to ruler configuration This new option determines the amount of time an alert should keep firing while the ruler expression doesn't return results. - More efficient chunks fetching and caching Enable with the new experimental feature flag
-blocks-storage.bucket-store.chunks-cache.fine-grained-chunks-caching-enabled=true
. This should reduce CPU, memory utilization, and receive bandwidth of a store-gateway. - Experimental query sharding improvements:
A new configuration parameter,-query-frontend.query-sharding-target-series-per-shard
, allows query sharding to take into account cardinality of similar requests executed previously when computing the maximum number of shards to use. If you want to try it out, we recommend starting with a value of2500
. - Experimental support for native histogram ingestion:
Native histograms can now be ingested. The new per-tenant limit-ingester.native-histograms-ingestion-enabled
controls whether native histograms are stored or ignored. The support for querying native histograms is not complete yet and it's expected to be available in the next release.
Alertmanager improvements
- New metrics The following upstream metrics are now exposed:
cortex_alertmanager_dispatcher_aggregation_groups
cortex_alertmanager_dispatcher_alert_processing_duration_seconds
Helm chart improvements
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the Grafana Mimir Helm chart documentation.
Important changes
In Grafana Mimir 2.7, the default vaules of the following configuration options have changed:
-blocks-storage.bucket-store.batch-series-size
is now enabled by default with a value of5000
.-ruler.evaluation-delay-duration
has changed from0
to1m
.
In Grafana Mimir 2.7, the following configuration options are now deprecated:
-blocks-storage.bucket-store.chunks-cache.subrange-size
since there's no benefit to changing the default of16000
-blocks-storage.bucket-store.consistency-delay
has been deprecated and will be removed in Mimir 2.9.-compactor.consistency-delay
has been deprecated and will be removed in Mimir 2.9.-ingester.ring.readiness-check-ring-health
has been deprecated and will be removed in Mimir 2.9.
In Grafana Mimir 2.7, the following options, metrics, and labels have been removed:
- Experimental support for ephemeral storage introduced in Mimir 2.6.0 has been removed.
- Following options are no longer available:
-blocks-storage.ephemeral-tsdb.*
-distributor.ephemeral-series-enabled
-distributor.ephemeral-series-matchers
-ingester.max-ephemeral-series-per-user
-ingester.instance-limits.max-ephemeral-series
- The following metrics have been removed:
cortex_ingester_ephemeral_series
cortex_ingester_ephemeral_series_created_total
cortex_ingester_ephemeral_series_removed_total
cortex_ingester_ingested_ephemeral_samples_total
cortex_ingester_ingested_ephemeral_samples_failures_total
cortex_ingester_memory_ephemeral_users
cortex_ingester_queries_ephemeral_total
cortex_ingester_queried_ephemeral_samples
cortex_ingester_queried_ephemeral_series
- Additionally, querying using the
{__mimir_storage__="ephemeral"}
selector no longer works. All label values with theephemeral-
prefix within thereason
label of thecortex_discarded_samples_total
metric are no longer available.
- Following options are no longer available:
- The store-gateway default index header reader no longer uses mmap and the mmap-based index header reader has been removed. The following flags have been changed:
-blocks-storage.bucket-store.index-header.map-populate-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-max-idle-file-handles
has been renamed to-blocks-storage.bucket-store.index-header.max-idle-file-handles
, and the corresponding configuration file option has been renamed fromstream_reader_max_idle_file_handles
tomax_idle_file_handles
Bug fixes
- Store-gateway: return Canceled rather than Aborted or Internal error when the calling querier cancels a label names or values request, and return Internal if processing the request fails for another reason. PR 4061
- Querier: track canceled requests with status code 499 in the metrics instead of 503 or 422. PR 4099
- Ingester: compact out-of-order data during /ingester/flush or when TSDB is idle. PR 4180
- Ingester: conversion of global limits max-series-per-user, max-series-per-metric, max-metadata-per-user and max-metadata-per-metric into corresponding local limits now takes into account the number of ingesters in each zone. PR 4238
- Ingester: track cortex_ingester_memory_series metric consistently with cortex_ingester_memory_series_created_total and cortex_ingester_memory_series_removed_total. PR 4312
- Querier: fixed a bug which was incorrectly matching series with regular expression label matchers with begin/end anchors in the middle of the regular expression. PR 4340
Changelog
2.7.1
Grafana Mimir
- [CHANGE] Ingester: the configuration parameter
-ingester.ring.readiness-check-ring-health
has been deprecated and will be removed in Mimir 2.9. #4422 - [CHANGE] Ruler: changed default value of
-ruler.evaluation-delay-duration
option from 0 to 1m. #4250 - [CHANGE] Querier: Errors with status code
422
coming from the store-gateway are propagated and not converted to the consistency check error anymore. #4100 - [CHANGE] Store-gateway: When a query hits
max_fetched_chunks_per_query
andmax_fetched_series_per_query
limits, an error with the status code422
is created and returned. #4056 - [CHANGE] Packaging: Migrate FPM packaging solution to NFPM. Rationalize packages dependencies and add package for all binaries. #3911
- [CHANGE] Store-gateway: Deprecate flag
-blocks-storage.bucket-store.chunks-cache.subrange-size
since there's no benefit to changing the default of16000
. #4135 - [CHANGE] Experimental support for ephemeral storage introduced in Mimir 2.6.0 has been removed. Following options are no longer available: #4252
-blocks-storage.ephemeral-tsdb.*
-distributor.ephemeral-series-enabled
-distributor.ephemeral-series-matchers
-ingester.max-ephemeral-series-per-user
-ingester.instance-limits.max-ephemeral-series
Querying with using{__mimir_storage__="ephemeral"}
selector no longer works. All label values withephemeral-
prefix inreason
label ofcortex_discarded_samples_total
metric are no longer available. Following metrics have been removed:cortex_ingester_ephemeral_series
cortex_ingester_ephemeral_series_created_total
cortex_ingester_ephemeral_series_removed_total
cortex_ingester_ingested_ephemeral_samples_total
cortex_ingester_ingested_ephemeral_samples_failures_total
cortex_ingester_memory_ephemeral_users
cortex_ingester_queries_ephemeral_total
cortex_ingester_queried_ephemeral_samples
cortex_ingester_queried_ephemeral_series
- [CHANGE] Store-gateway: use mmap-less index-header reader by default and remove mmap-based index header reader. The following flags have changed: #4280
-blocks-storage.bucket-store.index-header.map-populate-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-enabled
has been removed-blocks-storage.bucket-store.index-header.stream-reader-max-idle-file-handles
has been renamed to-blocks-storage.bucket-store.index-header.max-idle-file-handles
, and the corresponding configuration file option has been renamed fromstream_reader_max_idle_file_handles
tomax_idle_file_handles
- [CHANGE] Store-gateway: the streaming store-gateway is now enabled by default. The new default setting for `-blocks-storage.bucket-store.batc...
2.6.0
This release contains 259 PRs from 40 authors, including new contributors breadly7, bubu11e, Đurica Yuri Nikolić, Felix Beuke, Jack, klagroix, Martin Chodur, Ørjan Ommundsen, Sascha Sternheim, Wu Zhiyuan. Thank you!
Grafana Mimir version 2.6.0 release notes
Grafana Labs is excited to announce version 2.6 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Features and enhancements
-
Lower memory usage in store-gateway by streaming series results
The store-gateway can now stream results back to the querier instead of buffering them. This is expected to greatly reduce peak memory consumption while keeping latency the same. This is still an experimental feature but Grafana Labs is already running it in production and there's no known issue. This feature can be enabled setting the-blocks-storage.bucket-store.batch-series-size
configuration option (if you want to try it out, we recommend you setting to 5000). -
Improved stability in store-gateway by removing mmap usage
The store-gateway can now use an alternate code path to read index-headers that does not use memory mapped files. This is expected to improve stability of the store-gateway. This is still an experimental feature but Grafana Labs is already running it in production and there's no known issue. This feature can be enabled setting-blocks-storage.bucket-store.index-header.stream-reader-enabled=true
.
Alertmanager improvements
-
Webex support Alertmanager can now use Webex to send alerts.
-
tenantID template function A new template function
tenantID
, returning the ID of the tenant owning the alert, has been added. -
grafanaExploreURL template function A new template function
grafanaExploreURL
, returning the URL to the Grafana explore page with range query, has been added.
Helm chart improvements
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the corresponding documentation for more information.
Important changes
In Grafana Mimir 2.6 we have removed the following previously deprecated or experimental configuration options:
- The CLI flag
-blocks-storage.bucket-store.max-concurrent-reject-over-limit
and its respective YAML configuration optionblocks_storage.bucket_store.max_concurrent_reject_over_limit
. - The CLI flag
-query-frontend.align-querier-with-step
and its respective YAML configuration optionfrontend.align_querier_with_step
.
The following configuration options are deprecated and will be removed in Grafana Mimir 2.8:
- The CLI flag
-store.max-query-length
and its respective YAML configuration optionlimits.max_query_length
have been replaced with-querier.max-partial-query-length
andlimits.max_partial_query_length
.
The following experimental options and features are now stable:
- The CLI flag
-query-frontend.max-total-query-length
and its respective YAML configuration optionlimits.max_total_query_length
. - The CLI flags
-distributor.request-rate-limit
and-distributor.request-burst-limit
and their respective YAML configuration optionslimits.request_rate_limit
andlimits.request_rate_burst
. - The CLI flag
-ingester.max-global-exemplars-per-user
and its respective YAML configuration optionlimits.max_global_exemplars_per_user
. - The CLI flag
-ingester.tsdb-config-update-period
its respective YAML configuration optioningester.tsdb_config_update_period
. - The API endpoint
/api/v1/query_exemplars
.
Bug fixes
- Alertmanager: Fix template spurious deletion with relative data dir. PR 3604
- Security: Update prometheus/exporter-toolkit for CVE-2022-46146. PR 3675
- Security: Update golang.org/x/net for CVE-2022-41717. PR 3755
- Debian package: Fix post-install, environment file path and user creation. PR 3720
- Memberlist: Fix panic during Mimir startup when Mimir receives gossip message before it's ready. PR 3746
- Update
github.com/thanos-io/objstore
to address issue with Multipart PUT on s3-compatible Object Storage. PR 3802 PR 3821 - Querier: Canceled requests are no longer reported as "consistency check" failures. PR 3837 PR 3927
- Distributor: Don't panic when
metric_relabel_configs
in overrides contains null element. PR 3868 - Ingester, Compactor: Fix panic that can occur when compaction fails. PR 3955
Changelog
2.6.0
Grafana Mimir
- [CHANGE] Querier: Introduce
-querier.max-partial-query-length
to limit the time range for partial queries at the querier level and deprecate-store.max-query-length
. #3825 #4017 - [CHANGE] Store-gateway: Remove experimental
-blocks-storage.bucket-store.max-concurrent-reject-over-limit
flag. #3706 - [CHANGE] Ingester: If shipping is enabled block retention will now be relative to the upload time to cloud storage. If shipping is disabled block retention will be relative to the creation time of the block instead of the mintime of the last block created. #3816
- [CHANGE] Query-frontend: Deprecated CLI flag
-query-frontend.align-querier-with-step
has been removed. #3982 - [FEATURE] Store-gateway: streaming of series. The store-gateway can now stream results back to the querier instead of buffering them. This is expected to greatly reduce peak memory consumption while keeping latency the same. You can enable this feature by setting
-blocks-storage.bucket-store.batch-series-size
to a value in the high thousands (5000-10000). This is still an experimental feature and is subject to a changing API and instability. #3540 #3546 #3587 #3606 #3611 #3620 #3645 #3355 #3697 #3666 #3687 #3728 #3739 #3751 #3779 #3839 - [FEATURE] Alertmanager: Added support for the Webex receiver. #3758
- [FEATURE] Limits: Added the
-validation.separate-metrics-group-label
flag. This allows further separation of thecortex_discarded_samples_total
metric by an additionalgroup
label - which is configured by this flag to be the value of a specific label on an incoming timeseries. Active groups are tracked and inactive groups are cleaned up on a defined interval. The maximum number of groups tracked is controlled by the-max-separate-metrics-groups-per-user
flag. #3439 - [FEATURE] Overrides-exporter: Added experimental ring support to overrides-exporter via
-overrides-exporter.ring.enabled
. When enabled, the ring is used to establish a leader replica for the export of limit override metrics. #3908 #3953 - [FEATURE] Ephemeral storage (experimental): Mimir can now accept samples into "ephemeral storage". Such samples are available for querying for a short amount of time (
-blocks-storage.ephemeral-tsdb.retention-period
, defaults to 10 minutes), and then removed from memory. To use ephemeral storage, distributor must be configured with-distributor.ephemeral-series-enabled
option. Series matching-distributor.ephemeral-series-matchers
will be marked for storing into ephemeral storage in ingesters. Each tenant needs to have ephemeral storage enabled by using-ingester.max-ephemeral-series-per-user
limit, which defaults to 0 (no ephemeral storage). Ingesters have new-ingester.instance-limits.max-ephemeral-series
limit for total number of series in ephemeral storage across all tenants. If ingestion of samples into ephemeral storage fails,cortex_discarded_samples_total
metric will use values prefixed withephemeral-
forreason
label. Querying of ephemeral storage is possible by using{__mimir_storage__="ephemeral"}
as metric selector. Following new metrics related to ephemeral storage are introduced: #3897 #3922 #3961 #3997 #4004cortex_ingester_ephemeral_series
cortex_ingester_ephemeral_series_created_total
cortex_ingester_ephemeral_series_removed_total
cortex_ingester_ingested_ephemeral_samples_total
cortex_ingester_ingested_ephemeral_samples_failures_total
cortex_ingester_memory_ephemeral_users
cortex_ingester_queries_ephemeral_total
cortex_ingester_queried_ephemeral_samples
cortex_ingester_queried_ephemeral_series
- [ENHANCEMENT] Added new metric
thanos_shipper_last_successful_upload_time
: Unix timestamp (in seconds) of the last successful TSDB block uploaded to the bucket. #3627 - [ENHANCEMENT] Ruler: Added
-ruler.alertmanager-client.tls-enabled
configuration for alertmanager client. #3432 #3597 - [ENHANCEMENT] Activity tracker logs now have
component=activity-tracker
label. #3556 - [ENHANCEMENT] Distributor: remove labels with empty values #2439
- [ENHANCEMENT] Query-frontend: track query HTTP requests in the Activity Tracker. #3561
- [ENHANCEMENT] Store-gateway: Add experimental alternate implementation of index-header reader that does not use memory mapped files. The index-header reader is expected to improve stability of the store-gateway. You can enable this implementation with the flag
-blocks-storage.bucket-store.index-header.stream-reader-enabled
. #3639 #3691 #3703 #3742 #3785 #3787 #3797 - [ENHANCEMENT] Query-scheduler: add
cortex_query_scheduler_cancelled_requests_total
metric to track the number of requests that are already cancelled when dequeued. #3696 - [ENHANCEMENT] Store-gateway: add
cortex_bucket_store_partitioner_extended_ranges_total
metric to keep ...
2.6.0-rc.0
This release contains 255 PRs from 40 authors, including new contributors breadly7, bubu11e, Đurica Yuri Nikolić, Felix Beuke, Jack, klagroix, Martin Chodur, Ørjan Ommundsen, Sascha Sternheim, Wu Zhiyuan. Thank you!
Grafana Mimir version 2.6.0-rc.0 release notes
Grafana Labs is excited to announce version 2.6.0-rc.0 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Features and enhancements
-
Lower memory usage in store-gateway by streaming series results
The store-gateway can now stream results back to the querier instead of buffering them. This is expected to greatly reduce peak memory consumption while keeping latency the same. This is still an experimental feature but Grafana Labs is already running it in production and there's no known issue. This feature can be enabled setting the-blocks-storage.bucket-store.batch-series-size
configuration option (if you want to try it out, we recommend you setting to 5000). -
Improved stability in store-gateway by removing mmap usage
The store-gateway can now use an alternate code path to read index-headers that does not use memory mapped files. This is expected to improve stability of the store-gateway. This is still an experimental feature but Grafana Labs is already running it in production and there's no known issue. This feature can be enabled setting-blocks-storage.bucket-store.index-header.stream-reader-enabled=true
.
Alertmanager improvements
-
Webex support Alertmanager can now use Webex to send alerts.
-
tenantID template function A new template function
tenantID
, returning the ID of the tenant owning the alert, has been added. -
grafanaExploreURL template function A new template function
grafanaExploreURL
, returning the URL to the Grafana explore page with range query, has been added.
Helm chart improvements
The Grafana Mimir and Grafana Enterprise Metrics Helm chart is now released independently. See the corresponding documentation for more information.
Important changes
In Grafana Mimir 2.6 we have removed the following previously deprecated or experimental configuration options:
- The CLI flag
-blocks-storage.bucket-store.max-concurrent-reject-over-limit
and its respective YAML configuration optionblocks_storage.bucket_store.max_concurrent_reject_over_limit
. - The CLI flag
-query-frontend.align-querier-with-step
and its respective YAML configuration optionfrontend.align_querier_with_step
.
The following configuration options are deprecated and will be removed in Grafana Mimir 2.8:
- The CLI flag
-store.max-query-length
and its respective YAML configuration optionlimits.max_query_length
have been replaced with-querier.max-partial-query-length
andlimits.max_partial_query_length
.
The following experimental options and features are now stable:
- The CLI flag
-query-frontend.max-total-query-length
and its respective YAML configuration optionlimits.max_total_query_length
. - The CLI flags
-distributor.request-rate-limit
and-distributor.request-burst-limit
and their respective YAML configuration optionslimits.request_rate_limit
andlimits.request_rate_burst
. - The CLI flag
-ingester.max-global-exemplars-per-user
and its respective YAML configuration optionlimits.max_global_exemplars_per_user
. - The CLI flag
-ingester.tsdb-config-update-period
its respective YAML configuration optioningester.tsdb_config_update_period
. - The API endpoint
/api/v1/query_exemplars
.
Bug fixes
- Alertmanager: Fix template spurious deletion with relative data dir. PR 3604
- Security: Update prometheus/exporter-toolkit for CVE-2022-46146. PR 3675
- Security: Update golang.org/x/net for CVE-2022-41717. PR 3755
- Debian package: Fix post-install, environment file path and user creation. PR 3720
- Memberlist: Fix panic during Mimir startup when Mimir receives gossip message before it's ready. PR 3746
- Update
github.com/thanos-io/objstore
to address issue with Multipart PUT on s3-compatible Object Storage. PR 3802 PR 3821 - Querier: Canceled requests are no longer reported as "consistency check" failures. PR 3837 PR 3927
- Distributor: Don't panic when
metric_relabel_configs
in overrides contains null element. PR 3868 - Ingester, Compactor: Fix panic that can occur when compaction fails. PR 3955
Changelog
2.6.0-rc.0
Grafana Mimir
- [CHANGE] Querier: Introduce
-querier.max-partial-query-length
to limit the time range for partial queries at the querier level and deprecate-store.max-query-length
. #3825 #4017 - [CHANGE] Store-gateway: Remove experimental
-blocks-storage.bucket-store.max-concurrent-reject-over-limit
flag. #3706 - [CHANGE] Ingester: If shipping is enabled block retention will now be relative to the upload time to cloud storage. If shipping is disabled block retention will be relative to the creation time of the block instead of the mintime of the last block created. #3816
- [CHANGE] Query-frontend: Deprecated CLI flag
-query-frontend.align-querier-with-step
has been removed. #3982 - [FEATURE] Store-gateway: streaming of series. The store-gateway can now stream results back to the querier instead of buffering them. This is expected to greatly reduce peak memory consumption while keeping latency the same. You can enable this feature by setting
-blocks-storage.bucket-store.batch-series-size
to a value in the high thousands (5000-10000). This is still an experimental feature and is subject to a changing API and instability. #3540 #3546 #3587 #3606 #3611 #3620 #3645 #3355 #3697 #3666 #3687 #3728 #3739 #3751 #3779 #3839 - [FEATURE] Alertmanager: Added support for the Webex receiver. #3758
- [FEATURE] Limits: Added the
-validation.separate-metrics-group-label
flag. This allows further separation of thecortex_discarded_samples_total
metric by an additionalgroup
label - which is configured by this flag to be the value of a specific label on an incoming timeseries. Active groups are tracked and inactive groups are cleaned up on a defined interval. The maximum number of groups tracked is controlled by the-max-separate-metrics-groups-per-user
flag. #3439 - [FEATURE] Overrides-exporter: Added experimental ring support to overrides-exporter via
-overrides-exporter.ring.enabled
. When enabled, the ring is used to establish a leader replica for the export of limit override metrics. #3908 #3953 - [FEATURE] Ephemeral storage (experimental): Mimir can now accept samples into "ephemeral storage". Such samples are available for querying for a short amount of time (
-blocks-storage.ephemeral-tsdb.retention-period
, defaults to 10 minutes), and then removed from memory. To use ephemeral storage, distributor must be configured with-distributor.ephemeral-series-enabled
option. Series matching-distributor.ephemeral-series-matchers
will be marked for storing into ephemeral storage in ingesters. Each tenant needs to have ephemeral storage enabled by using-ingester.max-ephemeral-series-per-user
limit, which defaults to 0 (no ephemeral storage). Ingesters have new-ingester.instance-limits.max-ephemeral-series
limit for total number of series in ephemeral storage across all tenants. If ingestion of samples into ephemeral storage fails,cortex_discarded_samples_total
metric will use values prefixed withephemeral-
forreason
label. Querying of ephemeral storage is possible by using{__mimir_storage__="ephemeral"}
as metric selector. Following new metrics related to ephemeral storage are introduced: #3897 #3922 #3961 #3997 #4004cortex_ingester_ephemeral_series
cortex_ingester_ephemeral_series_created_total
cortex_ingester_ephemeral_series_removed_total
cortex_ingester_ingested_ephemeral_samples_total
cortex_ingester_ingested_ephemeral_samples_failures_total
cortex_ingester_memory_ephemeral_users
cortex_ingester_queries_ephemeral_total
cortex_ingester_queried_ephemeral_samples
cortex_ingester_queried_ephemeral_series
- [ENHANCEMENT] Added new metric
thanos_shipper_last_successful_upload_time
: Unix timestamp (in seconds) of the last successful TSDB block uploaded to the bucket. #3627 - [ENHANCEMENT] Ruler: Added
-ruler.alertmanager-client.tls-enabled
configuration for alertmanager client. #3432 #3597 - [ENHANCEMENT] Activity tracker logs now have
component=activity-tracker
label. #3556 - [ENHANCEMENT] Distributor: remove labels with empty values #2439
- [ENHANCEMENT] Query-frontend: track query HTTP requests in the Activity Tracker. #3561
- [ENHANCEMENT] Store-gateway: Add experimental alternate implementation of index-header reader that does not use memory mapped files. The index-header reader is expected to improve stability of the store-gateway. You can enable this implementation with the flag
-blocks-storage.bucket-store.index-header.stream-reader-enabled
. #3639 #3691 #3703 #3742 #3785 #3787 #3797 - [ENHANCEMENT] Query-scheduler: add
cortex_query_scheduler_cancelled_requests_total
metric to track the number of requests that are already cancelled when dequeued. #3696 - [ENHANCEMENT] Store-gateway: add `cortex_bucket_store_partitioner_ex...
2.5.0
This release contains 230 PRs from 43 authors, including new contributors Aldo D'Aquino, Anıl Mısırlıoğlu, Charles Korn, Danny Staple, Dylan Crees, Eduardo Silvi, FG, Jesse Weaver, KarlisAG, Leegin-darknight, Rohan Kumar, Wille Faler, Y.Horie, manohar-koukuntla, paulroche, songjiayang, Éamon Ryan. Thank you!
Grafana Mimir version 2.5 release notes
Grafana Labs is excited to announce version 2.5 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Features and enhancements
-
Alertmanager Discord support
Alertmanager can now be configured to send alerts in Discord channels. -
Configurable TLS minimum version and cipher suites
We added the flags-server.tls-min-version
and-server.tls-cipher-suites
that can be used to define the minimum TLS version and the supported cipher suites in all HTTP and gRPC servers in Mimir. -
Lower memory usage in store-gateway, ingester and alertmanager
We made various changes related to how index lookups are performed and how the active series custom trackers are implemented, which results in better performance and lower overall memory usage in the store-gateway and ingester.
We also optimized the alertmanager, which results in a 50% reduction in memory usage in use cases with larger numbers of tenants. -
Improved Mimir dashboards
We added two new dashboards namedMimir / Overview resources
andMimir / Overview networking
. Furthermore, we have made various improvements to the following existing dashboards:Mimir / Overview
: Add "remote read", "metadata", and "exemplar" queries.Mimir / Writes
: Add optional row about the distributor's new forwarding feature.Mimir / Tenants
: Add insights into the read path.
Helm chart improvements
-
Zone aware replication
Helm now supports deploying the ingesters and store-gateways as different availability zones. The replication is also zone-aware, therefore multiple instances of one zone can fail without any service interruption and roll outs can be performed faster because many instances of each zone can be restarted together, as opposed to them all restarting in sequence.This is a breaking change, for details on how to upgrade please review the Helm changelog.
-
Running without root privileges
All Mimir, GEM and Agent processes now don't require root privileges to run anymore. -
Unified reverse proxy (
gateway
) configuration for Mimir and GEM
This change allows for an easier upgrade path from Mimir to GEM, without any downtime. The unified configuration also makes it possible to autoscale the GEM gateway pods and it supports OpenShift Route. The change also deprecates thenginx
section in the configuration. The section will be removed in release7.0.0
. -
Updated MinIO
The MinIO sub-chart was updated from4.x
to5.0.0
, note that this update inherits a breaking change because the MinIO gateway mode was removed. -
Updated sizing plans
We updated our sizing plans to make them reflect better how we recommend running Mimir and GEM in production. Note that this includes a breaking change for users of the "small" plan, more details can be found in the Helm changelog. -
Various quality of life improvements
- Rollout strategies without downtime
- Read path and compactor configuration refresh, providing better default settings
- OTLP ingestion support in the Nginx configuration
- A default configuration for alertmanager, so the user interface and the sending of alerts from the ruler works out of the box
Bug fixes
- Flusher: Added
Overrides
as a dependency to prevent panics when starting with-target=flusher
. PR 3151 - Query-frontend: properly close gRPC streams to the query-scheduler to stop memory and goroutines leak. PR 3302
- Ruler: persist evaluation delay configured in the rulegroup. PR 3392
- Fix panics in OTLP ingest path when parse errors occur. PR 3538
Changelog
2.5.0
Grafana Mimir
- [CHANGE] Flag
-azure.msi-resource
is now ignored, and will be removed in Mimir 2.7. This setting is now made automatically by Azure. #2682 - [CHANGE] Experimental flag
-blocks-storage.tsdb.out-of-order-capacity-min
has been removed. #3261 - [CHANGE] Distributor: Wrap errors from pushing to ingesters with useful context, for example clarifying timeouts. #3307
- [CHANGE] The default value of
-server.http-write-timeout
has changed from 30s to 2m. #3346 - [CHANGE] Reduce period of health checks in connection pools for querier->store-gateway, ruler->ruler, and alertmanager->alertmanager clients to 10s. This reduces the time to fail a gRPC call when the remote stops responding. #3168
- [CHANGE] Hide TSDB block ranges period config from doc and mark it experimental. #3518
- [FEATURE] Alertmanager: added Discord support. #3309
- [ENHANCEMENT] Added
-server.tls-min-version
and-server.tls-cipher-suites
flags to configure cipher suites and min TLS version supported by HTTP and gRPC servers. #2898 - [ENHANCEMENT] Distributor: Add age filter to forwarding functionality, to not forward samples which are older than defined duration. If such samples are not ingested,
cortex_discarded_samples_total{reason="forwarded-sample-too-old"}
is increased. #3049 #3113 - [ENHANCEMENT] Store-gateway: Reduce memory allocation when generating ids in index cache. #3179
- [ENHANCEMENT] Query-frontend: truncate queries based on the configured creation grace period (
--validation.create-grace-period
) to avoid querying too far into the future. #3172 - [ENHANCEMENT] Ingester: Reduce activity tracker memory allocation. #3203
- [ENHANCEMENT] Query-frontend: Log more detailed information in the case of a failed query. #3190
- [ENHANCEMENT] Added
-usage-stats.installation-mode
configuration to track the installation mode via the anonymous usage statistics. #3244 - [ENHANCEMENT] Compactor: Add new
cortex_compactor_block_max_time_delta_seconds
histogram for detecting if compaction of blocks is lagging behind. #3240 #3429 - [ENHANCEMENT] Ingester: reduced the memory footprint of active series custom trackers. #2568
- [ENHANCEMENT] Distributor: Include
X-Scope-OrgId
header in requests forwarded to configured forwarding endpoint. #3283 #3385 - [ENHANCEMENT] Alertmanager: reduced memory utilization in Mimir clusters with a large number of tenants. #3309
- [ENHANCEMENT] Add experimental flag
-shutdown-delay
to allow components to wait after receiving SIGTERM and before stopping. In this time the component returns 503 from /ready endpoint. #3298 - [ENHANCEMENT] Go: update to go 1.19.3. #3371
- [ENHANCEMENT] Alerts: added
RulerRemoteEvaluationFailing
alert, firing when communication between ruler and frontend fails in remote operational mode. #3177 #3389 - [ENHANCEMENT] Clarify which S3 signature versions are supported in the error "unsupported signature version". #3376
- [ENHANCEMENT] Store-gateway: improved index header reading performance. #3393 #3397 #3436
- [ENHANCEMENT] Store-gateway: improved performance of series matching. #3391
- [ENHANCEMENT] Move the validation of incoming series before the distributor's forwarding functionality, so that we don't forward invalid series. #3386 #3458
- [ENHANCEMENT] S3 bucket configuration now validates that the endpoint does not have the bucket name prefix. #3414
- [ENHANCEMENT] Query-frontend: added "fetched index bytes" to query statistics, so that the statistics contain the total bytes read by store-gateways from TSDB block indexes. #3206
- [ENHANCEMENT] Distributor: push wrapper should only receive unforwarded samples. #2980
- [BUGFIX] Flusher: Add
Overrides
as a dependency to prevent panics when starting with-target=flusher
. #3151 - [BUGFIX] Updated
golang.org/x/text
dependency to fix CVE-2022-32149. #3285 - [BUGFIX] Query-frontend: properly close gRPC streams to the query-scheduler to stop memory and goroutines leak. #3302
- [BUGFIX] Ruler: persist evaluation delay configured in the rulegroup. #3392
- [BUGFIX] Ring status pages: show 100% ownership as "100%", not "1e+02%". #3435
- [BUGFIX] Fix panics in OTLP ingest path when parse errors exist. #3538
Mixin
- [CHANGE] Alerts: Change
MimirSchedulerQueriesStuck
for
time to 7 minutes to account for the time it takes for HPA to scale up. #3223 - [CHANGE] Dashboards: Removed the
Querier > Stages
panel from theMimir / Queries
dashboard. #3311 - [CHANGE] Configuration: The format of the
autoscaling
section of the configuration has changed to support more components. #3378- Instead of specific config variables for each component, they are listed in a dictionary. For example,
autoscaling.querier_enabled
becomesautoscaling.querier.enabled
.
- Instead of specific config variables for each component, they are listed in a dictionary. For example,
- [FEATURE] Dashboards: Added "Mimir / Overview resources" dashboard, providing an high level view over a Mimir cluster resources utilization. #3481
- [FEATURE] Dashboards: Added "Mimir / Overview networking" dashboard, providing an high level view over a Mimir cluster network bandwidth, inflight requests and TCP connections. #3487
- [FEATURE] Compile baremetal mixin along k8s mixin. #3162 #3514
- [ENHANCEMENT] Alerts: Add MimirRingMembersMismatch firing when a component does not have the expected number of running jobs. #2404
- [ENHANCEMENT] Dashboards: Add optional row about the Distributor's metric forwarding feature ...
2.5.0-rc.0
This release contains 227 PRs from 43 authors, including new contributors Aldo D'Aquino, Anıl Mısırlıoğlu, Charles Korn, Danny Staple, Dylan Crees, Eduardo Silvi, FG, Jesse Weaver, KarlisAG, Leegin-darknight, Rohan Kumar, Wille Faler, Y.Horie, manohar-koukuntla, paulroche, songjiayang, Éamon Ryan. Thank you!
Grafana Mimir version 2.5.0-rc.0 release notes
Grafana Labs is excited to announce version 2.5.0-rc.0 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Features and enhancements
-
Alertmanager Discord support
Alertmanager can now be configured to send alerts in Discord channels. -
Configurable TLS minimum version and cipher suites
We added the flags-server.tls-min-version
and-server.tls-cipher-suites
that can be used to define the minimum TLS version and the supported cipher suites in all HTTP and gRPC servers in Mimir. -
Lower memory usage in store-gateway, ingester and alertmanager
We made various changes related to how index lookups are performed and how the active series custom trackers are implemented, which results in better performance and lower overall memory usage in the store-gateway and ingester.
We also optimized the alertmanager, which results in a 50% reduction in memory usage in use cases with larger numbers of tenants. -
Improved Mimir dashboards
We added two new dashboards namedMimir / Overview resources
andMimir / Overview networking
. Furthermore, we have made various improvements to the following existing dashboards:Mimir / Overview
: Add "remote read", "metadata", and "exemplar" queries.Mimir / Writes
: Add optional row about the distributor's new forwarding feature.Mimir / Tenants
: Add insights into the read path.
Helm chart improvements
-
Zone aware replication
Helm now supports deploying the ingesters and store-gateways as different availability zones. The replication is also zone-aware, therefore multiple instances of one zone can fail without any service interruption and roll outs can be performed faster because many instances of each zone can be restarted together, as opposed to them all restarting in sequence.This is a breaking change, for details on how to upgrade please review the Helm changelog.
-
Running without root privileges
All Mimir, GEM and Agent processes now don't require root privileges to run anymore. -
Unified reverse proxy (
gateway
) configuration for Mimir and GEM
This change allows for an easier upgrade path from Mimir to GEM, without any downtime. The unified configuration also makes it possible to autoscale the GEM gateway pods and it supports OpenShift Route. The change also deprecates thenginx
section in the configuration. The section will be removed in release7.0.0
. -
Updated MinIO
The MinIO sub-chart was updated from4.x
to5.0.0
, note that this update inherits a breaking change because the MinIO gateway mode was removed. -
Updated sizing plans
We updated our sizing plans to make them reflect better how we recommend running Mimir and GEM in production. Note that this includes a breaking change for users of the "small" plan, more details can be found in the Helm changelog. -
Various quality of life improvements
- Rollout strategies without downtime
- Read path and compactor configuration refresh, providing better default settings
- OTLP ingestion support in the Nginx configuration
- A default configuration for alertmanager, so the user interface and the sending of alerts from the ruler works out of the box
Bug fixes
- Flusher: Added
Overrides
as a dependency to prevent panics when starting with-target=flusher
. PR 3151 - Query-frontend: properly close gRPC streams to the query-scheduler to stop memory and goroutines leak. PR 3302
- Ruler: persist evaluation delay configured in the rulegroup. PR 3392
- Fix panics in OTLP ingest path when parse errors occur. PR 3538
Changelog
2.5.0-rc.0
Grafana Mimir
- [CHANGE] Flag
-azure.msi-resource
is now ignored, and will be removed in Mimir 2.7. This setting is now made automatically by Azure. #2682 - [CHANGE] Experimental flag
-blocks-storage.tsdb.out-of-order-capacity-min
has been removed. #3261 - [CHANGE] Distributor: Wrap errors from pushing to ingesters with useful context, for example clarifying timeouts. #3307
- [CHANGE] The default value of
-server.http-write-timeout
has changed from 30s to 2m. #3346 - [CHANGE] Reduce period of health checks in connection pools for querier->store-gateway, ruler->ruler, and alertmanager->alertmanager clients to 10s. This reduces the time to fail a gRPC call when the remote stops responding. #3168
- [CHANGE] Hide TSDB block ranges period config from doc and mark it experimental. #3518
- [FEATURE] Alertmanager: added Discord support. #3309
- [ENHANCEMENT] Added
-server.tls-min-version
and-server.tls-cipher-suites
flags to configure cipher suites and min TLS version supported by HTTP and gRPC servers. #2898 - [ENHANCEMENT] Distributor: Add age filter to forwarding functionality, to not forward samples which are older than defined duration. If such samples are not ingested,
cortex_discarded_samples_total{reason="forwarded-sample-too-old"}
is increased. #3049 #3113 - [ENHANCEMENT] Store-gateway: Reduce memory allocation when generating ids in index cache. #3179
- [ENHANCEMENT] Query-frontend: truncate queries based on the configured creation grace period (
--validation.create-grace-period
) to avoid querying too far into the future. #3172 - [ENHANCEMENT] Ingester: Reduce activity tracker memory allocation. #3203
- [ENHANCEMENT] Query-frontend: Log more detailed information in the case of a failed query. #3190
- [ENHANCEMENT] Added
-usage-stats.installation-mode
configuration to track the installation mode via the anonymous usage statistics. #3244 - [ENHANCEMENT] Compactor: Add new
cortex_compactor_block_max_time_delta_seconds
histogram for detecting if compaction of blocks is lagging behind. #3240 #3429 - [ENHANCEMENT] Ingester: reduced the memory footprint of active series custom trackers. #2568
- [ENHANCEMENT] Distributor: Include
X-Scope-OrgId
header in requests forwarded to configured forwarding endpoint. #3283 #3385 - [ENHANCEMENT] Alertmanager: reduced memory utilization in Mimir clusters with a large number of tenants. #3309
- [ENHANCEMENT] Add experimental flag
-shutdown-delay
to allow components to wait after receiving SIGTERM and before stopping. In this time the component returns 503 from /ready endpoint. #3298 - [ENHANCEMENT] Go: update to go 1.19.3. #3371
- [ENHANCEMENT] Alerts: added
RulerRemoteEvaluationFailing
alert, firing when communication between ruler and frontend fails in remote operational mode. #3177 #3389 - [ENHANCEMENT] Clarify which S3 signature versions are supported in the error "unsupported signature version". #3376
- [ENHANCEMENT] Store-gateway: improved index header reading performance. #3393 #3397 #3436
- [ENHANCEMENT] Store-gateway: improved performance of series matching. #3391
- [ENHANCEMENT] Move the validation of incoming series before the distributor's forwarding functionality, so that we don't forward invalid series. #3386 #3458
- [ENHANCEMENT] S3 bucket configuration now validates that the endpoint does not have the bucket name prefix. #3414
- [ENHANCEMENT] Query-frontend: added "fetched index bytes" to query statistics, so that the statistics contain the total bytes read by store-gateways from TSDB block indexes. #3206
- [ENHANCEMENT] Distributor: push wrapper should only receive unforwarded samples. #2980
- [BUGFIX] Flusher: Add
Overrides
as a dependency to prevent panics when starting with-target=flusher
. #3151 - [BUGFIX] Updated
golang.org/x/text
dependency to fix CVE-2022-32149. #3285 - [BUGFIX] Query-frontend: properly close gRPC streams to the query-scheduler to stop memory and goroutines leak. #3302
- [BUGFIX] Ruler: persist evaluation delay configured in the rulegroup. #3392
- [BUGFIX] Ring status pages: show 100% ownership as "100%", not "1e+02%". #3435
- [BUGFIX] Fix panics in OTLP ingest path when parse errors exist. #3538
Mixin
- [CHANGE] Alerts: Change
MimirSchedulerQueriesStuck
for
time to 7 minutes to account for the time it takes for HPA to scale up. #3223 - [CHANGE] Dashboards: Removed the
Querier > Stages
panel from theMimir / Queries
dashboard. #3311 - [CHANGE] Configuration: The format of the
autoscaling
section of the configuration has changed to support more components. #3378- Instead of specific config variables for each component, they are listed in a dictionary. For example,
autoscaling.querier_enabled
becomesautoscaling.querier.enabled
.
- Instead of specific config variables for each component, they are listed in a dictionary. For example,
- [FEATURE] Dashboards: Added "Mimir / Overview resources" dashboard, providing an high level view over a Mimir cluster resources utilization. #3481
- [FEATURE] Dashboards: Added "Mimir / Overview networking" dashboard, providing an high level view over a Mimir cluster network bandwidth, inflight requests and TCP connections. #3487
- [FEATURE] Compile baremetal mixin along k8s mixin. #3162 #3514
- [ENHANCEMENT] Alerts: Add MimirRingMembersMismatch firing when a component does not have the expected number of running jobs. #2404
- [ENHANCEMENT] Dashboards: Add optional row about the Distributor's metric ...
2.4.0
This release contains 190 PRs from 29 authors, including new contributors Fayzal Ghantiwala, Furkan Türkal, Joe Blubaugh, Justin Lei, Nicolas DUPEUX, Paul Puschmann, Radu Domnu, Shubham Ranjan. Thank you!
Grafana Mimir version 2.4.0 release notes
Grafana Labs is excited to announce version 2.4 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Note: If you are upgrading from Grafana Mimir 2.3, review the list of important changes that follow.
Features and enhancements
-
Query-scheduler ring-based service discovery:
The query-scheduler is an optional, stateless component that retains a queue of queries to execute, and distributes the workload to available queriers. The use the query-scheduler, query-frontends and queriers are required to discover the addresses of the query-scheduler instances.In addition to DNS-based service discovery, Mimir 2.4 introduces the ring-based service discovery for the query-scheduler. When enabled, the query-schedulers join their own hash ring (similar to other Mimir components), and the query-frontends and queriers discover query-scheduler instances via the ring.
Ring-based service discovery makes it easier to set up the query-scheduler in environments where you can't easily define a DNS entry that resolves to the running query-scheduler instances. For more information, refer to query-scheduler configuration.
-
New API endpoint exposes per-tenant limits:
Mimir 2.4 introduces a new API endpoint, which is available on all Mimir components that load the runtime configuration. The endpoint exposes the limits of the authenticated tenant. You can use this new API endpoint when developing custom integrations with Mimir that require looking up the actual limits that are applied on a given tenant. For more information, refer to Get tenant limits. -
New TLS configuration options:
Mimir 2.4 introduces new options to configure the accepted TLS cipher suites, and the minimum versions for the HTTP and gRPC clients that are used between Mimir components, or by Mimir to communicate to external services such as Consul or etcd.You can use these new configuration options to override the default TLS settings and meet your security policy requirements. For more information, refer to Securing Grafana Mimir communications with TLS.
-
Maximum range query length limit:
Mimir 2.4 introduces the new configuration option-query-frontend.max-total-query-length
to limit the maximum range query length, which is computed as the query'send
minusstart
timestamp. This limit is enforced in the query-frontend and defaults to-store.max-query-length
if unset.The new configuration option allows you to set different limits between the received query maximum length (
-query-frontend.max-total-query-length
) and the maximum length of partial queries after splitting and sharding (-store.max-query-length
).
The following experimental features have been promoted to stable:
Helm chart improvements
The mimir-distributed
Helm chart is the best way to install Mimir on Kubernetes. As part of the Mimir 2.4 release, we’re also releasing version 3.2 of the mimir-distributed
Helm chart.
Notable enhancements follow. For the full list of changes, see the Helm chart changelog.
- Added support for topologySpreadContraints.
- Replaced the default anti-affinity rules with topologySpreadContraints for all components which puts less restrictions on where Kubernetes can run pods.
- Important: if you are not using the sizing plans (small.yaml, large.yaml, capped-small.yaml, capped-large.yaml) in production, you must reintroduce pod affinity rules for the ingester and store-gateway. This also fixes a missing label selector for the ingester.
Merge the following with your custom values file:ingester: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: target operator: In values: - ingester topologyKey: "kubernetes.io/hostname" - labelSelector: matchExpressions: - key: app.kubernetes.io/component operator: In values: - ingester topologyKey: "kubernetes.io/hostname" store_gateway: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: target operator: In values: - store-gateway topologyKey: "kubernetes.io/hostname" - labelSelector: matchExpressions: - key: app.kubernetes.io/component operator: In values: - store-gateway topologyKey: "kubernetes.io/hostname"
- Updated the anti affinity rules in the sizing plans (small.yaml, large.yaml, capped-small.yaml, capped-large.yaml). The sizing plans now enforce that no two pods of the ingester, store-gateway, or alertmanager StatefulSets are scheduled on the same Node. Pods from different StaatefulSets can share a Node.
- Support for Openshift Route resource for nginx has been added.
Important changes
In Grafana Mimir 2.4, the default values of the following configuration options have changed:
-distributor.remote-timeout
has changed from20s
to2s
.-distributor.forwarding.request-timeout
has changed from10s
to2s
.-blocks-storage.tsdb.head-compaction-concurrency
has changed from5
to1
.- The hash-ring heartbeat period for distributors, ingesters, rulers, and compactors has increased from
5s
to15s
.
In Grafana Mimir 2.4, the following deprecated configuration options have been removed:
- The YAML configuration option
limits.active_series_custom_trackers_config
. - The CLI flag
-ingester.ring.join-after
and its respective YAML configuration optioningester.ring.join_after
. - The CLI flag
-querier.shuffle-sharding-ingesters-lookback-period
and its respective YAML configuration optionquerier.shuffle_sharding_ingesters_lookback_period
.
With Grafana Mimir 2.4, the anonymous usage statistics tracking is enabled by default.
Mimir maintainers use this anonymous information to learn more about how the open source community runs Mimir and what the Mimir team should focus on when working on the next features and documentation improvements.
If possible, we ask you to keep the usage reporting feature enabled.
In case you want to opt-out from anonymous usage statistics reporting, refer to Disable the anonymous usage statistics reporting.
Bug fixes
- PR 2979: Fix remote write HTTP response status code returned by Mimir when failing to write only to one ingester (the quorum is still honored when running Mimir with the default replication factor of 3) and some series are not ingested because of validation errors or some limits being reached.
- PR 3005: Fix the querier to re-balance its workers connections when a query-frontend or query-scheduler instance is terminated.
- PR 2963: Fix the remote read endpoint to correctly support the
Accept-Encoding: snappy
HTTP request header.
Changelog
2.4.0
Grafana Mimir
- [CHANGE] Distributor: change the default value of
-distributor.remote-timeout
to2s
from20s
and-distributor.forwarding.request-timeout
to2s
from10s
to improve distributor resource usage when ingesters crash. #2728 #2912 - [CHANGE] Anonymous usage statistics tracking: added the
-ingester.ring.store
value. #2981 - [CHANGE] Series metadata
HELP
that is longer than-validation.max-metadata-length
is now truncated silently, instead of being dropped with a 400 status code. #2993 - [CHANGE] Ingester: changed default setting for
-ingester.ring.readiness-check-ring-health
fromtrue
tofalse
. #2953 - [CHANGE] Anonymous usage statistics tracking has been enabled by default, to help Mimir maintainers make better decisions to support the open source community. #2939 #3034
- [CHANGE] Anonymous usage statistics tracking: added the minimum and maximum value of
-ingester.out-of-order-time-window
. #2940 - [CHANGE] The default hash ring heartbeat period for distributors, ingesters, rulers an...
2.4.0-rc.1
This release contains 8 PRs from 2 authors. Thank you!
Changelog
2.4.0-rc.1
Grafana Mimir
- [BUGFIX] Fix distributor from returning a 500 status code when a 400 was received from the ingester. #3211
- [BUGFIX] Fix incorrect OS value set in Mimir v2.3.* RPM packages. #3221
All changes in this release: mimir-2.4.0-rc.0...mimir-2.4.0-rc.1
2.4.0-rc.0
This release contains 166 PRs from 29 authors. Thank you!
Grafana Mimir version 2.4.0-rc.0 release notes
Grafana Labs is excited to announce version 2.4 of Grafana Mimir.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Note: If you are upgrading from Grafana Mimir 2.3, review the list of important changes that follow.
Features and enhancements
-
Query-scheduler ring-based service discovery: The query-scheduler is an optional, stateless component that retains a queue of queries to execute, and distributes the workload to available queriers. The use the query-scheduler, query-frontends and queriers are required to discover the addresses of the query-scheduler instances.
In addition to DNS-based service discovery, Mimir 2.4 introduces the ring-based service discovery for the query-scheduler. When enabled, the query-schedulers join their own hash ring (similar to other Mimir components), and the query-frontends and queriers discover query-scheduler instances via the ring.
Ring-based service discovery makes it easier to set up the query-scheduler in environments where you can’t easily define a DNS entry that resolves to the running query-scheduler instances. For more information, refer to query-scheduler configuration.
-
New API endpoint exposes per-tenant limits: Mimir 2.4 introduces a new API endpoint, which is available on all Mimir components that load the runtime configuration. The endpoint exposes the limits of the authenticated tenant. You can use this new API endpoint when developing custom integrations with Mimir that require looking up the actual limits that are applied on a given tenant. For more information, refer to Get tenant limits.
New TLS configuration options: Mimir 2.4 introduces new options to configure the accepted TLS cipher suites, and the minimum versions for the HTTP and gRPC clients that are used between Mimir components, or by Mimir to communicate to external services such as Consul or etcd.
You can use these new configuration options to override the default TLS settings and meet your security policy requirements. For more information, refer to Securing Grafana Mimir communications with TLS.
-
Maximum range query length limit: Mimir 2.4 introduces the new configuration option
-query-frontend.max-total-query-length
to limit the maximum range query length, which is computed as the query’s end minus start timestamp. This limit is enforced in the query-frontend and defaults to-store.max-query-length
if unset.The new configuration option allows you to set different limits between the received query maximum length (
-query-frontend.max-total-query-length
) and the maximum length of partial queries after splitting and sharding (-store.max-query-length
).
Helm chart improvements
The mimir-distributed
Helm chart is the best way to install Mimir on Kubernetes. As part of the Mimir 2.4 release, we’re also releasing version 3.2 of the mimir-distributed
Helm chart.
Notable enhancements follow. For the full list of changes, see the Helm chart changelog.
-
Added support for topologySpreadContraints.
-
Replaced the default anti-affinity rules with topologySpreadContraints for all components which puts less restrictions on where Kubernetes can run pods.
-
Important: if you are not using the sizing plans (small.yaml, large.yaml, capped-small.yaml, capped-large.yaml) in production, you must reintroduce pod affinity rules for the ingester and store-gateway. This also fixes a missing label selector for the ingester. Merge the following with your custom values file:
ingester: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: target operator: In values: - ingester topologyKey: "kubernetes.io/hostname" - labelSelector: matchExpressions: - key: app.kubernetes.io/component operator: In values: - ingester topologyKey: "kubernetes.io/hostname" store_gateway: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: target operator: In values: - store-gateway topologyKey: "kubernetes.io/hostname" - labelSelector: matchExpressions: - key: app.kubernetes.io/component operator: In values: - store-gateway topologyKey: "kubernetes.io/hostname"
-
Updated the anti affinity rules in the sizing plans (small.yaml, large.yaml, capped-small.yaml, capped-large.yaml). The sizing plans now enforce that no two pods of the ingester, store-gateway, or alertmanager StatefulSets are scheduled on the same Node. Pods from different StaatefulSets can share a Node.
-
Support for Openshift Route resource for nginx has been added.
Important changes
In Grafana Mimir 2.4, the default values of the following configuration options have changed:
-distributor.remote-timeout
has changed from20s
to2s
.-distributor.forwarding.request-timeout
has changed from10s
to2s
.-blocks-storage.tsdb.head-compaction-concurrency
has changed from5
to1
.- The hash-ring heartbeat period for distributors, ingesters, rulers, and compactors has increased from
5s
to15s
.
With Grafana Mimir 2.4, the anonymous usage statistics tracking is enabled by default. Mimir maintainers use this anonymous information to learn more about how the open source community runs Mimir and what the Mimir team should focus on when working on the next features and documentation improvements. If possible, we ask you to keep the usage reporting feature enabled. In case you want to opt-out from anonymous usage statistics reporting, refer to Disable the anonymous usage statistics reporting.
Bug fixes
- PR 2979: Fix remote write HTTP response status code returned by Mimir when failing to write only to one ingester (the quorum is still honored when running Mimir with the default replication factor of 3) and some series are not ingested because of validation errors or some limits being reached.
- PR 3005: Fix the querier to re-balance its workers connections when a query-frontend or query-scheduler instance is terminated.
- PR 2963: Fix the remote read endpoint to correctly support the
Accept-Encoding: snappy
HTTP request header.
Changelog
2.4.0-rc.0
Grafana Mimir
- [CHANGE] Distributor: change the default value of
-distributor.remote-timeout
to2s
from20s
and-distributor.forwarding.request-timeout
to2s
from10s
to improve distributor resource usage when ingesters crash. #2728 #2912 - [CHANGE] Anonymous usage statistics tracking: added the
-ingester.ring.store
value. #2981 - [CHANGE] Series metadata
HELP
that is longer than-validation.max-metadata-length
is now truncated silently, instead of being dropped with a 400 status code. #2993 - [CHANGE] Ingester: changed default setting for
-ingester.ring.readiness-check-ring-health
fromtrue
tofalse
. #2953 - [CHANGE] Anonymous usage statistics tracking has been enabled by default, to help Mimir maintainers make better decisions to support the open source community. #2939 #3034
- [CHANGE] Anonymous usage statistics tracking: added the minimum and maximum value of
-ingester.out-of-order-time-window
. #2940 - [CHANGE] The default hash ring heartbeat period for distributors, ingesters, rulers and compactors has been increased from
5s
to15s
. Now the default heartbeat period for all Mimir hash rings is15s
. #3033 - [CHANGE] Reduce the default TSDB head compaction concurrency (
-blocks-storage.tsdb.head-compaction-concurrency
) from 5 to 1, in order to reduce CPU spikes. #3093 - [CHANGE] Ruler: the ruler's remote evaluation mode (
-ruler.query-frontend.address
) is now stable. #3109 - [CHANGE] Limits: removed the deprecated YAML configuration option
active_series_custom_trackers_config
. Please useactive_series_custom_trackers
instead. #3110 - [CHANGE] Ingester: removed the deprecated configuration option
-ingester.ring.join-after
. #3111 - [CHANGE] Querier: removed the deprecated configuration option
-querier.shuffle-sharding-ingesters-lookback-period
. The value of `-querier.query...
mimir-2.3.1
This release contains 5 PRs from 1 author. Thank you!
2.3.1
Grafana Mimir
- [BUGFIX] Query-frontend: query sharding took exponential time to map binary expressions. #3027
- [BUGFIX] Distributor: Stop panics on OTLP endpoint when a single metric has multiple timeseries. #3040
Full Changelog: mimir-2.3.0...mimir-2.3.1
mimir-2.3.0
Grafana Mimir version 2.3 release notes
Grafana Labs is excited to announce version 2.3 of Grafana Mimir, the most scalable, most performant open source time series database in the world.
The highlights that follow include the top features, enhancements, and bugfixes in this release. For the complete list of changes, see the changelog.
Note: If you are upgrading from Grafana Mimir 2.2, review the list of important changes that follow.
This release contains 370 PRs from 39 authors. Thank you!
Features and enhancements
-
Ingest metrics in OpenTelemetry format:
This release of Grafana Mimir introduces experimental support for ingesting metrics from the OpenTelemetry Collector'sotlphttp
exporter. This adds a second ingestion option for users of the OTel Collector; Mimir was already compatible with theprometheusremotewrite
exporter. For more information, please see Configure OTel Collector. -
Tenant federation for metadata queries:
Users with tenant federation enabled could already issue instant queries, range queries, and exemplar queries to multiple tenants at once and receive a single aggregated result. With Grafana Mimir 2.3, we've added tenant federation support to the/api/v1/metadata
endpoint as well. -
Simpler object storage configuration:
Users can now configure block, alertmanager, and ruler storage all at once with thecommon
YAML config option key (or-common.storage.*
CLI flags). By centralizing your object storage configuration in one place, this enhancement makes configuration faster and less error prone. Users may still individually configure storage for each of these components if they desire. For more information, see the Common Configurations. -
.deb and .rpm packages for Mimir:
Starting with version 2.3, we're publishing .deb and .rpm files for Grafana Mimir, which will make installing and running it on Debian or RedHat-based linux systems much easier. Thank you to community contributor wilfriedroset for your work to implement this! -
Import historic data:
Users can now backfill time series data from their existing Prometheus or Cortex installation into Mimir usingmimirtool
, making it possible to migrate to Grafana Mimir without losing your existing metrics data. This support is still considered experimental and does not yet work for data stored in Thanos. To learn more about this feature, please seemimirtool backfill
and Configure TSDB block upload -
Increased instant query performance:
Grafana Mimir now supports splitting instant queries by time. This allows it to better parallelize execution of instant queries and therefore return results faster. At present, splitting is only supported for a subset of instant queries, which means not all instant queries will see a speedup. This feature is currently experimental and is disabled by default. It can be enabled with thesplit_instant_queries_by_interval
YAML config option in thelimits
section (or the CLI flag-query-frontend.split-instant-queries-by-interval
).
Helm chart improvements
The Mimir Helm chart is the best way to install Mimir on Kubernetes. As part of the Mimir 2.3 release, we’re also releasing version 3.1 of the Mimir Helm chart.
Notable enhancements follow. For the full list of changes, see the Helm chart changelog.
- We've upgraded the MinIO subchart dependency from a deprecated chart to the supported one. This creates a breaking change in how the administrator password is set. However, as the built-in MinIO is not a recommended object store for production use cases, this change did not warrant a new major version of the Mimir Helm chart.
- Query sharding is now enabled by default which should give you better performance on high cardinality metrics queries.
- To compensate for the increased number of queries generated by query sharding, the query scheduler component is now enabled by default.
- The backfill API endpoints for importing historic time series data are now exposed on the Nginx gateway.
- Nginx now sets the value of the
X-Scope-OrgID
header equal to the value of Mimir'sno_auth_tenant
parameter by default. The previous release had set the value ofX-Scope-OrgID
toanonymous
by default which complicated the process of migrating to Mimir. - Memberlist now uses DNS service-discovery by default, which decreases startup time for large Mimir clusters.
Important changes
In Grafana Mimir 2.3 we have removed the following previously deprecated configuration options:
- The
extend_writes
parameter in the distributor YAML configuration and-distributor.extend-writes
CLI flag have been removed. - The
active_series_custom_trackers
parameter has been removed from the YAML configuration. It had already been moved to the runtime configuration. See #1188 for details. - The
blocks-storage.tsdb.isolation-enabled
parameter in the YAML configuration and-blocks-storage.tsdb.isolation-enabled
CLI flag have been removed.
With Grafana Mimir 2.3 we have also updated the default value for the CLI flag -distributor.ha-tracker.max-clusters
to 100
to provide Denial-of-Service protection. Previously -distributor.ha-tracker.max-clusters
was unlimited by default which could allow a tenant with HA Dedupe enabled to overload the HA tracker with __cluster__
label values that could cause the HA Dedupe database to fail.
Also, as noted above, the administrator password for Helm chart deployments using the built-in MinIO is now set differently.
Bug fixes
- PR 2447: Fix incorrect mapping of http status codes
429
to500
when the request queue is full in the query-frontend. This corrects behavior in the query-frontend where a retryable429 "Too Many Outstanding Requests"
error from a querier was incorrectly returned as an unretryable500
system error. - PR 2505: The Memberlist key-value (KV) store now tries to "fast-join" the cluster to avoid serving an empty KV store. This fix addresses the confusing "empty ring" error response and the error log message "ring doesn't exist in KV store yet" emitted by services when there are other members present in the ring when a service starts. Those using other key-value store options (e.g., consul, etcd) are not impacted by this bug.
- PR 2289: The "List Prometheus rules" API endpoint of the Mimir Ruler component is no longer blocked while rules are being synced. This means users can now list rules while syncing larger rule sets.
Changelog
2.3.0
Grafana Mimir
- [CHANGE] Ingester: Added user label to ingester metric
cortex_ingester_tsdb_out_of_order_samples_appended_total
. On multitenant clusters this helps us find the rate of appended out-of-order samples for a specific tenant. #2493 - [CHANGE] Compactor: delete source and output blocks from local disk on compaction failed, to reduce likelihood that subsequent compactions fail because of no space left on disk. #2261
- [CHANGE] Ruler: Remove unused CLI flags
-ruler.search-pending-for
and-ruler.flush-period
(and their respective YAML config options). #2288 - [CHANGE] Successful gRPC requests are no longer logged (only affects internal API calls). #2309
- [CHANGE] Add new
-*.consul.cas-retry-delay
flags. They have a default value of1s
, while previously there was no delay between retries. #2309 - [CHANGE] Store-gateway: Remove the experimental ability to run requests in a dedicated OS thread pool and associated CLI flag
-store-gateway.thread-pool-size
. #2423 - [CHANGE] Memberlist: disabled TCP-based ping fallback, because Mimir already uses a custom transport based on TCP. #2456
- [CHANGE] Change default value for
-distributor.ha-tracker.max-clusters
to100
to provide a DoS protection. #2465 - [CHANGE] Experimental block upload API exposed by compactor has changed: Previous
/api/v1/upload/block/{block}
endpoint for starting block upload is now/api/v1/upload/block/{block}/start
, and previous endpoint/api/v1/upload/block/{block}?uploadComplete=true
for finishing block upload is now/api/v1/upload/block/{block}/finish
. New API endpoint has been added:/api/v1/upload/block/{block}/check
. #2486 #2548 - [CHANGE] Compactor: changed
-compactor.max-compaction-time
default from0s
(disabled) to1h
. When compacting blocks for a tenant, the compactor will move to compact blocks of another tenant or re-plan blocks to compact at least every 1h. #2514...