From 6978ee5d73878d7406bda68296a3dd8ecda4d538 Mon Sep 17 00:00:00 2001 From: Ed Welch Date: Mon, 26 Oct 2020 11:47:40 -0400 Subject: [PATCH] Loki Release: update release notes and docs (#2808) * update release notes and docs! * add go and cortex version * tweaks * tweak wording * change paths * typo, thanks Cyril ;) * thanks Owen ;) --- CHANGELOG.md | 210 +++++++ docs/sources/alerting/_index.md | 4 +- docs/sources/configuration/examples.md | 2 +- docs/sources/getting-started/_index.md | 1 + docs/sources/getting-started/grafana.md | 13 +- docs/sources/getting-started/logcli.md | 2 +- docs/sources/operations/storage/_index.md | 4 +- .../operations/storage/boltdb-shipper.md | 6 +- docs/sources/operations/storage/filesystem.md | 100 ---- docs/sources/operations/upgrade.md | 367 +----------- docs/sources/storage/_index.md | 84 +-- docs/sources/upgrading/_index.md | 559 ++++++++++++++++++ 12 files changed, 838 insertions(+), 514 deletions(-) create mode 100644 docs/sources/upgrading/_index.md diff --git a/CHANGELOG.md b/CHANGELOG.md index bd370f044360c..c90949f5fcacb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,213 @@ +## 2.0.0 (2020/10/26) + +2.0.0 is here!! + +We are extremely excited about the new features in 2.0.0, unlocking a whole new world of observability of our logs. + +Thanks again for the many incredible contributions and improvements from the wonderful Loki community, we are very excited for the future! + +### Important Notes + +**Please Note** There are several changes in this release which require your attention! + +* Anyone using a docker image please go read the [upgrade guide](https://github.com/grafana/loki/blob/master/docs/upgrading/_index.md#200)!! There is one important consideration around a potentially breaking schema change depending on your configuration. +* MAJOR changes have been made to the boltdb-shipper index, breaking changes are not expected but extra precautions are highly recommended, more details in the [upgrade guide](https://github.com/grafana/loki/blob/master/docs/upgrading/_index.md#200). +* The long deprecated `entry_parser` config in Promtail has been removed, use [pipeline_stages](https://grafana.com/docs/loki/latest/clients/promtail/configuration/#pipeline_stages) instead. + +Check the [upgrade guide](https://github.com/grafana/loki/blob/master/docs/upgrading/_index.md#200) for detailed information on all these changes. + +### 2.0!!!! + +There are too many PR's to list individually for the major improvements which we thought justified a 2.0 but here is the high level: + +* Significant enhancements to the [LogQL query language](https://grafana.com/docs/loki/latest/logql/)! +** [Parse](https://grafana.com/docs/loki/latest/logql/#parser-expression) your logs to extract labels at query time. +** [Filter](https://grafana.com/docs/loki/latest/logql/#label-filter-expression) on query time extracted labels. +** [Format](https://grafana.com/docs/loki/latest/logql/#line-format-expression) your log lines any way you please! +** [Graph](https://grafana.com/docs/loki/latest/logql/#unwrapped-range-aggregations) the contents of your log lines as metrics, including support for many more of your favorite PromQL functions. +* Generate prometheus [alerts directly from your logs](https://grafana.com/docs/loki/latest/alerting/)! +** Create alerts using the same prometheus alert rule syntax and let Loki send alerts directly to your Prometheus Alertmanager! +* [boltdb-shipper](https://grafana.com/docs/loki/latest/operations/storage/boltdb-shipper/) is now production ready! +** This is it! Now Loki only needs a single object store (S3,GCS,Filesystem...) to store all the data, no more Cassandra, DynamoDB or Bigtable! + +We are extremely excited about these new features, expect some talks, webinars, and blogs where we explain all this new functionality in detail. + +### Notable mention + +This is a small change but very helpful! + +* [2737](https://github.com/grafana/loki/pull/2737) **dlemel8**: cmd/loki: add "verify-config" flag + +Thank you @dlemel8 for this PR! Now you can start Loki with `-verify-config` to make sure your config is valid and Loki will exit with a status code 0 if it is! + +### All Changes + +#### Loki +* [2804](https://github.com/grafana/loki/pull/2804) **slim-bean**: Loki: log any chunk fetch failure +* [2803](https://github.com/grafana/loki/pull/2803) **slim-bean**: Update local and docker default config files to use boltdb-shipper with a few other config changes +* [2796](https://github.com/grafana/loki/pull/2796) **cyriltovena**: Fixes a bug that would add __error__ label incorrectly. +* [2793](https://github.com/grafana/loki/pull/2793) **cyriltovena**: Improve the way we reverse iterator for backward queries. +* [2790](https://github.com/grafana/loki/pull/2790) **sandeepsukhani**: Boltdb shipper metrics changes +* [2788](https://github.com/grafana/loki/pull/2788) **sandeepsukhani**: add a metric in compactor to record timestamp of last successful run +* [2786](https://github.com/grafana/loki/pull/2786) **cyriltovena**: Logqlv2 pushes groups down to edge +* [2778](https://github.com/grafana/loki/pull/2778) **cyriltovena**: Logqv2 optimization +* [2774](https://github.com/grafana/loki/pull/2774) **cyriltovena**: Handle panic in the store goroutine. +* [2773](https://github.com/grafana/loki/pull/2773) **cyriltovena**: Fixes race conditions in the batch iterator. +* [2770](https://github.com/grafana/loki/pull/2770) **sandeepsukhani**: Boltdb shipper query performance improvements +* [2769](https://github.com/grafana/loki/pull/2769) **cyriltovena**: LogQL: Labels and Metrics Extraction +* [2768](https://github.com/grafana/loki/pull/2768) **cyriltovena**: Fixes all lint errors. +* [2761](https://github.com/grafana/loki/pull/2761) **owen-d**: Service discovery refactor +* [2755](https://github.com/grafana/loki/pull/2755) **owen-d**: Revendor Cortex +* [2752](https://github.com/grafana/loki/pull/2752) **kavirajk**: fix: Remove depricated `entry_parser` from scrapeconfig +* [2741](https://github.com/grafana/loki/pull/2741) **owen-d**: better tenant logging in ruler memstore +* [2737](https://github.com/grafana/loki/pull/2737) **dlemel8**: cmd/loki: add "verify-config" flag +* [2735](https://github.com/grafana/loki/pull/2735) **cyriltovena**: Fixes the frontend logs to include org_id. +* [2732](https://github.com/grafana/loki/pull/2732) **sandeepsukhani**: set timestamp in instant query done by canaries +* [2726](https://github.com/grafana/loki/pull/2726) **dvrkps**: hack: clean getStore +* [2711](https://github.com/grafana/loki/pull/2711) **owen-d**: removes r/w pools from block/chunk types +* [2709](https://github.com/grafana/loki/pull/2709) **cyriltovena**: Bypass sharding middleware when a query can't be sharded. +* [2671](https://github.com/grafana/loki/pull/2671) **alrs**: pkg/querier: fix dropped error +* [2665](https://github.com/grafana/loki/pull/2665) **cnbailian**: Loki: Querier APIs respond JSON Content-Type +* [2663](https://github.com/grafana/loki/pull/2663) **owen-d**: improves numeric literal stringer impl +* [2662](https://github.com/grafana/loki/pull/2662) **owen-d**: exposes rule group validation fn +* [2661](https://github.com/grafana/loki/pull/2661) **owen-d**: Enable local rules backend & disallow configdb. +* [2656](https://github.com/grafana/loki/pull/2656) **sandeepsukhani**: run multiple queries per table at once with boltdb-shipper +* [2655](https://github.com/grafana/loki/pull/2655) **sandeepsukhani**: fix store query bug when running loki in single binary mode with boltdb-shipper +* [2650](https://github.com/grafana/loki/pull/2650) **owen-d**: Adds prometheus ruler routes +* [2647](https://github.com/grafana/loki/pull/2647) **arl**: pkg/chunkenc: fix test using string(int) conversion +* [2645](https://github.com/grafana/loki/pull/2645) **arl**: Tests: fix issue 2356: distributor_test.go fails when the system has no interface name in [eth0, en0, lo0] +* [2642](https://github.com/grafana/loki/pull/2642) **sandeepsukhani**: fix an issue with building loki +* [2640](https://github.com/grafana/loki/pull/2640) **sandeepsukhani**: improvements for boltdb-shipper compactor +* [2637](https://github.com/grafana/loki/pull/2637) **owen-d**: Ruler docs + single binary inclusion +* [2627](https://github.com/grafana/loki/pull/2627) **sandeepsukhani**: revendor cortex to latest master +* [2620](https://github.com/grafana/loki/pull/2620) **alrs**: pkg/storage/stores/shipper/uploads: fix test error +* [2614](https://github.com/grafana/loki/pull/2614) **cyriltovena**: Improve lz4 compression +* [2613](https://github.com/grafana/loki/pull/2613) **sandeepsukhani**: fix a panic when trying to stop boltdb-shipper multiple times using sync.once +* [2610](https://github.com/grafana/loki/pull/2610) **slim-bean**: Loki: Fix query-frontend ready handler +* [2601](https://github.com/grafana/loki/pull/2601) **sandeepsukhani**: rpc for querying ingesters to get chunk ids from its store +* [2589](https://github.com/grafana/loki/pull/2589) **owen-d**: Ruler/loki rule validator +* [2582](https://github.com/grafana/loki/pull/2582) **yeya24**: Add _total suffix to ruler counter metrics +* [2580](https://github.com/grafana/loki/pull/2580) **owen-d**: strict rule unmarshaling +* [2578](https://github.com/grafana/loki/pull/2578) **owen-d**: exports grouploader +* [2576](https://github.com/grafana/loki/pull/2576) **owen-d**: Better rule loading +* [2574](https://github.com/grafana/loki/pull/2574) **sandeepsukhani**: fix closing of compressed file from boltdb-shipper compactor +* [2572](https://github.com/grafana/loki/pull/2572) **adityacs**: Validate max_query_length in Labels API +* [2564](https://github.com/grafana/loki/pull/2564) **owen-d**: Error on no schema configs +* [2559](https://github.com/grafana/loki/pull/2559) **sandeepsukhani**: fix dir setup based on which mode it is running +* [2558](https://github.com/grafana/loki/pull/2558) **sandeepsukhani**: cleanup boltdb files in queriers during startup/shutdown +* [2552](https://github.com/grafana/loki/pull/2552) **owen-d**: fixes batch metrics help text & corrects bucketing +* [2550](https://github.com/grafana/loki/pull/2550) **sandeepsukhani**: fix a flaky test in boltdb shipper +* [2548](https://github.com/grafana/loki/pull/2548) **sandeepsukhani**: add some metrics for monitoring compactor +* [2546](https://github.com/grafana/loki/pull/2546) **sandeepsukhani**: register boltdb shipper compactor cli flags +* [2543](https://github.com/grafana/loki/pull/2543) **sandeepsukhani**: revendor cortex to latest master +* [2534](https://github.com/grafana/loki/pull/2534) **owen-d**: Consistent chunk metrics +* [2530](https://github.com/grafana/loki/pull/2530) **sandeepsukhani**: minor fixes and improvements for boltdb shipper +* [2526](https://github.com/grafana/loki/pull/2526) **sandeepsukhani**: compactor for compacting boltdb files uploaded by shipper +* [2510](https://github.com/grafana/loki/pull/2510) **owen-d**: adds batch based metrics +* [2507](https://github.com/grafana/loki/pull/2507) **sandeepsukhani**: compress boltdb files to gzip while uploading from shipper +* [2458](https://github.com/grafana/loki/pull/2458) **owen-d**: Feature/ruler (take 2) +* [2487](https://github.com/grafana/loki/pull/2487) **sandeepsukhani**: upload boltdb files from shipper only when they are not expected to be modified or during shutdown + +#### Docs +* [2797](https://github.com/grafana/loki/pull/2797) **cyriltovena**: Logqlv2 docs +* [2772](https://github.com/grafana/loki/pull/2772) **DesistDaydream**: reapir Retention Example Configuration +* [2762](https://github.com/grafana/loki/pull/2762) **PabloCastellano**: fix: typo in upgrade.md +* [2750](https://github.com/grafana/loki/pull/2750) **owen-d**: fixes path in prom rules api docs +* [2733](https://github.com/grafana/loki/pull/2733) **owen-d**: Removes wrong capitalizations +* [2728](https://github.com/grafana/loki/pull/2728) **vishesh92**: Docs: Update docs for redis +* [2725](https://github.com/grafana/loki/pull/2725) **dvrkps**: fix some misspells +* [2724](https://github.com/grafana/loki/pull/2724) **MadhavJivrajani**: DOCS: change format of unordered lists in technical docs +* [2716](https://github.com/grafana/loki/pull/2716) **huikang**: Doc: fixing parameter name in configuration +* [2705](https://github.com/grafana/loki/pull/2705) **owen-d**: shows cortextool lint command for loki in alerting docs +* [2702](https://github.com/grafana/loki/pull/2702) **huikang**: Doc: fix broken links in production/README.md +* [2699](https://github.com/grafana/loki/pull/2699) **sandangel**: docs: use repetitive numbering +* [2698](https://github.com/grafana/loki/pull/2698) **bemasher**: Doc: Vague link text. +* [2697](https://github.com/grafana/loki/pull/2697) **owen-d**: updates alerting docs with new cortex tool loki linting support +* [2692](https://github.com/grafana/loki/pull/2692) **philnichol**: Docs: Corrected incorrect instances of (setup|set up) +* [2691](https://github.com/grafana/loki/pull/2691) **UniqueTokens**: Update metrics.md +* [2689](https://github.com/grafana/loki/pull/2689) **pgassmann**: docker plugin documentation update +* [2686](https://github.com/grafana/loki/pull/2686) **demon**: docs: Fix link to code of conduct +* [2657](https://github.com/grafana/loki/pull/2657) **owen-d**: fixes ruler docs & includes ruler configs in cmd/configs + docker img +* [2622](https://github.com/grafana/loki/pull/2622) **sandeepsukhani**: add compactor details and other boltdb-shipper doc improvments +* [2621](https://github.com/grafana/loki/pull/2621) **cyriltovena**: Fixes links in aws tutorials. +* [2606](https://github.com/grafana/loki/pull/2606) **cyriltovena**: More template stage examples. +* [2605](https://github.com/grafana/loki/pull/2605) **Decad**: Update docs to use raw link +* [2600](https://github.com/grafana/loki/pull/2600) **slim-bean**: Docs: Fix broken links on generated site +* [2597](https://github.com/grafana/loki/pull/2597) **nek-00-ken**: Fixup: url to access promtail config sample +* [2595](https://github.com/grafana/loki/pull/2595) **sh0rez**: docs: fix broken links +* [2594](https://github.com/grafana/loki/pull/2594) **wardbekker**: Update README.md +* [2592](https://github.com/grafana/loki/pull/2592) **owen-d**: fixes some doc links +* [2591](https://github.com/grafana/loki/pull/2591) **woodsaj**: Docs: fix links in installation docs +* [2586](https://github.com/grafana/loki/pull/2586) **ms42Q**: Doc fixes: remove typos and long sentence +* [2579](https://github.com/grafana/loki/pull/2579) **oddlittlebird**: Update CODEOWNERS +* [2566](https://github.com/grafana/loki/pull/2566) **owen-d**: Website doc link fixes +* [2528](https://github.com/grafana/loki/pull/2528) **owen-d**: Update tanka.md with steps for using k8s-alpha lib +* [2512](https://github.com/grafana/loki/pull/2512) **palemtnrider**: Documentation: Fixes install and getting-started links in the readme +* [2508](https://github.com/grafana/loki/pull/2508) **owen-d**: memberlist correct yaml path. closes #2499 +* [2506](https://github.com/grafana/loki/pull/2506) **ferdikurniawan**: Docs: fix dead link +* [2505](https://github.com/grafana/loki/pull/2505) **sh0rez**: doc: close code block +* [2501](https://github.com/grafana/loki/pull/2501) **tivvit**: fix incorrect upgrade link +* [2500](https://github.com/grafana/loki/pull/2500) **oddlittlebird**: Docs: Update README.md + +#### Helm +* [2746](https://github.com/grafana/loki/pull/2746) **marcosartori**: helm/fluentbit K8S-Logging.Exclude & and Mem_Buf_Limit toggle +* [2742](https://github.com/grafana/loki/pull/2742) **steven-sheehy**: Fix linting errors and use of deprecated repositories +* [2659](https://github.com/grafana/loki/pull/2659) **rskrishnar**: [Promtail] enables configuring psp in helm chart +* [2554](https://github.com/grafana/loki/pull/2554) **alexandre-allard-scality**: production/helm: add support for PV selector in Loki statefulset + +#### FluentD +* [2739](https://github.com/grafana/loki/pull/2739) **jgehrcke**: FluentD loki plugin: add support for bearer_token_file parameter + +#### Fluent Bit +* [2568](https://github.com/grafana/loki/pull/2568) **zjj2wry**: fluent-bit plugin support TLS + +#### Promtail +* [2723](https://github.com/grafana/loki/pull/2723) **carlpett**: Promtail: Add counter promtail_batch_retries_total +* [2717](https://github.com/grafana/loki/pull/2717) **slim-bean**: Promtail: Fix deadlock on tailer shutdown. +* [2710](https://github.com/grafana/loki/pull/2710) **slim-bean**: Promtail: (and also fluent-bit) change the max batch size to 1MB +* [2708](https://github.com/grafana/loki/pull/2708) **Falco20019**: Promtail: Fix timestamp parser for short year format +* [2658](https://github.com/grafana/loki/pull/2658) **slim-bean**: Promtail: do not mark the position if the file is removed +* [2618](https://github.com/grafana/loki/pull/2618) **slim-bean**: Promtail: Add a stream lagging metric +* [2615](https://github.com/grafana/loki/pull/2615) **aminjam**: Add fallback_formats for timestamp stage +* [2603](https://github.com/grafana/loki/pull/2603) **rfratto**: Expose UserAgent and fix User-Agent version source +* [2575](https://github.com/grafana/loki/pull/2575) **unguiculus**: Promtail: Fix docker-compose.yaml +* [2571](https://github.com/grafana/loki/pull/2571) **rsteneteg**: Promtail: adding pipeline stage for dropping labels +* [2570](https://github.com/grafana/loki/pull/2570) **slim-bean**: Promtail: Fix concurrent map iteration when using stdin +* [2565](https://github.com/grafana/loki/pull/2565) **carlpett**: Add a counter for empty syslog messages +* [2542](https://github.com/grafana/loki/pull/2542) **slim-bean**: Promtail: implement shutdown for the no-op server +* [2532](https://github.com/grafana/loki/pull/2532) **slim-bean**: Promtail: Restart the tailer if we fail to read and upate current position + +#### Ksonnet +* [2719](https://github.com/grafana/loki/pull/2719) **halcyondude**: nit: fix formatting for ksonnet/loki +* [2677](https://github.com/grafana/loki/pull/2677) **sandeepsukhani**: fix jsonnet for memcached-writes when using boltdb-shipper +* [2617](https://github.com/grafana/loki/pull/2617) **periklis**: Add config options for loki dashboards +* [2612](https://github.com/grafana/loki/pull/2612) **fredr**: Dashboard: typo in Loki Operational dashboard +* [2599](https://github.com/grafana/loki/pull/2599) **sandeepsukhani**: fix closing bracket in dashboards from loki-mixin +* [2584](https://github.com/grafana/loki/pull/2584) **sandeepsukhani**: Read, Write and operational dashboard improvements +* [2560](https://github.com/grafana/loki/pull/2560) **owen-d**: Jsonnet/ruler +* [2547](https://github.com/grafana/loki/pull/2547) **sandeepsukhani**: jsonnet for running loki using boltdb-shipper +* [2525](https://github.com/grafana/loki/pull/2525) **Duologic**: fix(ksonnet): don't depend on specific k8s version +* [2521](https://github.com/grafana/loki/pull/2521) **charandas**: fix: broken links in Tanka documentation +* [2503](https://github.com/grafana/loki/pull/2503) **owen-d**: Ksonnet docs +* [2494](https://github.com/grafana/loki/pull/2494) **primeroz**: Jsonnet Promtail: Change function for mounting configmap in promtail daemonset + +#### Logstash +* [2607](https://github.com/grafana/loki/pull/2607) **adityacs**: Logstash cpu usage fix + +#### Build +* [2602](https://github.com/grafana/loki/pull/2602) **sandeepsukhani**: add support for building querytee +* [2561](https://github.com/grafana/loki/pull/2561) **tharun208**: Added logcli docker image +* [2549](https://github.com/grafana/loki/pull/2549) **simnv**: Ignore .exe files build for Windows +* [2527](https://github.com/grafana/loki/pull/2527) **owen-d**: Update docker-compose.yaml to use 1.6.0 + +#### Docker Logging Driver +* [2459](https://github.com/grafana/loki/pull/2459) **RaitoBezarius**: Docker logging driver: Add a keymod for the extra attributes from the Docker logging driver + +### Dependencies + +* Go Version: 1.14.2 +* Cortex Version: 85942c5703cf22b64cecfd291e7e7c42d1b8c30c + ## 1.6.1 (2020-08-24) This is a small release and only contains two fixes for Promtail: diff --git a/docs/sources/alerting/_index.md b/docs/sources/alerting/_index.md index 789ad311a867c..01f7e36093ae1 100644 --- a/docs/sources/alerting/_index.md +++ b/docs/sources/alerting/_index.md @@ -7,7 +7,7 @@ weight: 700 Loki includes a component called the Ruler, adapted from our upstream project, Cortex. The Ruler is responsible for continually evaluating a set of configurable queries and then alerting when certain conditions happen, e.g. a high percentage of error logs. -First, ensure the Ruler component is enabled. The following is a basic configuration which loads rules from configuration files (it requires `/tmp/rules` and `/tmp/scratch` exist): +First, ensure the Ruler component is enabled. The following is a basic configuration which loads rules from configuration files: ```yaml ruler: @@ -168,6 +168,8 @@ Because the rule files are identical to Prometheus rule files, we can interact w > **Note:** Not all commands in cortextool currently support Loki. +> **Note:** cortextool was intended to run against multi-tenant Loki, commands need an `--id=` flag set to the Loki instance ID or set the environment variable `CORTEX_TENANT_ID`. If Loki is running in single tenant mode, the required ID is `fake` (yes we know this might seem alarming but it's totally fine, no it can't be changed) + An example workflow is included below: ```sh diff --git a/docs/sources/configuration/examples.md b/docs/sources/configuration/examples.md index 0cafb00d006e4..4c0b28b3b52f4 100644 --- a/docs/sources/configuration/examples.md +++ b/docs/sources/configuration/examples.md @@ -179,7 +179,7 @@ storage_config: This is a configuration to deploy Loki depending only on storage solution, e.g. an S3-compatible API like minio. The ring configuration is based on the gossip memberlist -and the index is shipped to storage via [boltdb-shipper](../../operations/storage/boltdb-shipper/). +and the index is shipped to storage via [Single Store (boltdb-shipper)](../../operations/storage/boltdb-shipper/). ```yaml auth_enabled: false diff --git a/docs/sources/getting-started/_index.md b/docs/sources/getting-started/_index.md index cbd57051ad1a8..a5fb52d196df2 100644 --- a/docs/sources/getting-started/_index.md +++ b/docs/sources/getting-started/_index.md @@ -4,6 +4,7 @@ weight: 300 --- # Getting started with Loki +1. [Getting Logs Into Loki](get-logs-into-loki/) 1. [Grafana](grafana/) 1. [LogCLI](logcli/) 1. [Labels](labels/) diff --git a/docs/sources/getting-started/grafana.md b/docs/sources/getting-started/grafana.md index 299aa8d176801..4ea2346666fe1 100644 --- a/docs/sources/getting-started/grafana.md +++ b/docs/sources/getting-started/grafana.md @@ -6,21 +6,22 @@ title: Loki in Grafana Grafana ships with built-in support for Loki for versions greater than [6.0](https://grafana.com/grafana/download/6.0.0). Using [6.3](https://grafana.com/grafana/download/6.3.0) or later is highly -recommended to take advantage of new LogQL functionality. +recommended to take advantage of new [LogQL]({{< relref "../logql/_index.md" >}}) functionality. 1. Log into your Grafana instance. If this is your first time running Grafana, the username and password are both defaulted to `admin`. -2. In Grafana, go to `Configuration` > `Data Sources` via the cog icon on the +1. In Grafana, go to `Configuration` > `Data Sources` via the cog icon on the left sidebar. -3. Click the big + Add data source button. -4. Choose Loki from the list. -5. The http URL field should be the address of your Loki server. For example, +1. Click the big + Add data source button. +1. Choose Loki from the list. +1. The http URL field should be the address of your Loki server. For example, when running locally or with Docker using port mapping, the address is likely `http://localhost:3100`. When running with docker-compose or Kubernetes, the address is likely `http://loki:3100`. -6. To see the logs, click Explore on the sidebar, select the Loki +1. To see the logs, click Explore on the sidebar, select the Loki datasource in the top-left dropdown, and then choose a log stream using the Log labels button. +1. Learn more about querying by reading about Loki's query language [LogQL]({{< relref "../logql/_index.md" >}}). Read more about Grafana's Explore feature in the [Grafana documentation](http://docs.grafana.org/features/explore) and on how to diff --git a/docs/sources/getting-started/logcli.md b/docs/sources/getting-started/logcli.md index 295adf3fe32c4..6f11a1e1aca49 100644 --- a/docs/sources/getting-started/logcli.md +++ b/docs/sources/getting-started/logcli.md @@ -3,7 +3,7 @@ title: LogCLI --- # Querying Loki with LogCLI -If you prefer a command line interface, LogCLI also allows users to run LogQL +If you prefer a command line interface, LogCLI also allows users to run [LogQL]({{< relref "../logql/_index.md" >}}) queries against a Loki server. ## Installation diff --git a/docs/sources/operations/storage/_index.md b/docs/sources/operations/storage/_index.md index 728d07ee94b7f..30d3867f1d728 100644 --- a/docs/sources/operations/storage/_index.md +++ b/docs/sources/operations/storage/_index.md @@ -3,6 +3,8 @@ title: Storage --- # Loki Storage +[High level storage overview here]({{< relref "../../storage/_index.md" >}}) + Loki needs to store two different types of data: **chunks** and **indexes**. Loki receives logs in separate streams, where each stream is uniquely identified @@ -25,11 +27,11 @@ For more information: The following are supported for the index: +- [Single Store (boltdb-shipper) - Recommended for 2.0 and newer](boltdb-shipper/) index store which stores boltdb index files in the object store - [Amazon DynamoDB](https://aws.amazon.com/dynamodb) - [Google Bigtable](https://cloud.google.com/bigtable) - [Apache Cassandra](https://cassandra.apache.org) - [BoltDB](https://github.com/boltdb/bolt) (doesn't work when clustering Loki) -- [BoltDB Shipper](boltdb-shipper/) EXPERIMENTAL index store which stores boltdb index files in the object store The following are supported for the chunks: diff --git a/docs/sources/operations/storage/boltdb-shipper.md b/docs/sources/operations/storage/boltdb-shipper.md index 8b237642b0d11..967f2c1b3db4a 100644 --- a/docs/sources/operations/storage/boltdb-shipper.md +++ b/docs/sources/operations/storage/boltdb-shipper.md @@ -1,9 +1,7 @@ --- -title: BoltDB Shipper +title: Single Store (boltdb-shipper) --- -# Loki with BoltDB Shipper - -:warning: BoltDB Shipper is still an experimental feature. It is not recommended to be used in production environments. +# Single Store Loki (boltdb-shipper index type) BoltDB Shipper lets you run Loki without any dependency on NoSQL stores for storing index. It locally stores the index in BoltDB files instead and keeps shipping those files to a shared object store i.e the same object store which is being used for storing chunks. diff --git a/docs/sources/operations/storage/filesystem.md b/docs/sources/operations/storage/filesystem.md index 6635195bb44a1..95d0cda2c5222 100644 --- a/docs/sources/operations/storage/filesystem.md +++ b/docs/sources/operations/storage/filesystem.md @@ -42,103 +42,3 @@ The durability of the objects is at the mercy of the filesystem itself where oth ### High Availability Running Loki clustered is not possible with the filesystem store unless the filesystem is shared in some fashion (NFS for example). However using shared filesystems is likely going to be a bad experience with Loki just as it is for almost every other application. - -## New AND VERY EXPERIMENTAL in 1.5.0: Horizontal scaling of the filesystem store - -**WARNING** as the title suggests, this is very new and potentially buggy, and it is also very likely configs around this feature will change over time. - -With that warning out of the way, the addition of the [boltdb-shipper](../boltdb-shipper/) index store has added capabilities making it possible to overcome many of the limitations listed above using the filesystem store, specifically running Loki with the filesystem store on separate machines but still operate as a cluster supporting replication, and write distribution via the hash ring. - -As mentioned in the title, this is very alpha at this point but we would love for people to try this and help us flush out bugs. - -Here is an example config to run with Loki: - -Use this config on multiple computers (or containers), do not run it on the same computer as Loki uses the hostname as the ID in the ring. - -Do not use a shared fileystem such as NFS for this, each machine should have its own filesystem - -```yaml -auth_enabled: false # single tenant mode - -server: - http_listen_port: 3100 - -ingester: - max_transfer_retries: 0 # Disable blocks transfers on ingesters shutdown or rollout. - chunk_idle_period: 2h # Let chunks sit idle for at least 2h before flushing, this helps to reduce total chunks in store - max_chunk_age: 2h # Let chunks get at least 2h old before flushing due to age, this helps to reduce total chunks in store - chunk_target_size: 1048576 # Target chunks of 1MB, this helps to reduce total chunks in store - chunk_retain_period: 30s - - query_store_max_look_back_period: -1 # This will allow the ingesters to query the store for all data - lifecycler: - heartbeat_period: 5s - interface_names: - - eth0 - join_after: 30s - num_tokens: 512 - ring: - heartbeat_timeout: 1m - kvstore: - consul: - consistent_reads: true - host: localhost:8500 - http_client_timeout: 20s - store: consul - replication_factor: 1 # This can be increased and probably should if you are running multiple machines! - -schema_config: - configs: - - from: 2018-04-15 - store: boltdb-shipper - object_store: filesystem - schema: v11 - index: - prefix: index_ - period: 168h - -storage_config: - boltdb_shipper: - shared_store: filesystem - active_index_directory: /tmp/loki/index - cache_location: /tmp/loki/boltdb-cache - filesystem: - directory: /tmp/loki/chunks - -limits_config: - enforce_metric_name: false - reject_old_samples: true - reject_old_samples_max_age: 168h - -chunk_store_config: - max_look_back_period: 0s # No limit how far we can look back in the store - -table_manager: - retention_deletes_enabled: false - retention_period: 0s # No deletions, infinite retention -``` - -It does require Consul to be running for the ring (any of the ring stores will work: consul, etcd, memberlist, Consul is used in this example) - -It is also required that Consul be available from each machine, this example only specifies `host: localhost:8500` you would likely need to change this to the correct hostname/ip and port of your consul server. - -**The config needs to be the same on every Loki instance!** - -The important piece of this config is `query_store_max_look_back_period: -1` this tells Loki to allow the ingesters to look in the store for all the data. - -Traffic can be sent to any of the Loki servers, it can be round-robin load balanced if desired. - -Each Loki instance will use Consul to properly route both read and write data to the correct Loki instance. - -Scaling up is as easy as adding more loki instances and letting them talk to the same ring. - -Scaling down is harder but possible. You would need to shutdown a Loki server then take everything in: - -```yaml - filesystem: - directory: /tmp/loki/chunks -``` - -And copy it to the same directory on another Loki server, there is currently no way to split the chunks between servers you must move them all. We expect to provide more options here in the future. - - diff --git a/docs/sources/operations/upgrade.md b/docs/sources/operations/upgrade.md index e045c73f367af..5eb05c19c1057 100644 --- a/docs/sources/operations/upgrade.md +++ b/docs/sources/operations/upgrade.md @@ -1,376 +1,21 @@ --- title: Upgrade --- -# Upgrading Loki -Every attempt is made to keep Loki backwards compatible, such that upgrades should be low risk and low friction. +**THIS PAGE HAS MOVED** -Unfortunately Loki is software and software is hard and sometimes things are not as easy as we want them to be. +This page was moved to a top level [Upgrading Guide]({{< relref "../upgrading/_index.md" >}}) -On this page we will document any upgrade issues/gotchas/considerations we are aware of. - - -## Master / Unreleased - -### IMPORTANT: `results_cache.max_freshness` removed from YAML config - -The `max_freshness` config from `results_cache` has been removed in favour of another flag called `max_cache_freshness_per_query` in `limits_config` which has the same effect. -If you happen to have `results_cache.max_freshness` set please use `limits_config.max_cache_freshness_per_query` YAML config instead. - -### Important: Roll out ingesters before queriers when using BoltDB-Shipper - -Ingesters now expose a new RPC method that queriers use when the index type is `boltdb-shipper`. -Queriers generally roll out faster than ingesters, so if new queriers query older ingesters using the new RPC, the queries would fail. -To avoid any query downtime during the upgrade, rollout ingesters before queriers. +The headers on this page are being kept for existing links. ## 1.6.0 -### Important: Ksonnet port changed and removed NET_BIND_SERVICE capability from Docker image - -In 1.5.0 we changed the Loki user to not run as root which created problems binding to port 80. -To address this we updated the docker image to add the NET_BIND_SERVICE capability to the loki process -which allowed Loki to bind to port 80 as a non root user, so long as the underlying system allowed that -linux capability. - -This has proved to be a problem for many reasons and in PR [2294](https://github.com/grafana/loki/pull/2294/files) -the capability was removed. - -It is now no longer possible for the Loki to be started with a port less than 1024 with the published docker image. - -The default for Helm has always been port 3100, and Helm users should be unaffect unless they changed the default. - -**Ksonnet users however should closely check their configuration, in PR 2294 the loki port was changed from 80 to 3100** - - -### IMPORTANT: If you run Loki in microservices mode, special rollout instructions - -A new ingester GRPC API has been added allowing to speed up metric queries, to ensure a rollout without query errors **make sure you upgrade all ingesters first.** -Once this is done you can then proceed with the rest of the deployment, this is to ensure that queriers won't look for an API not yet available. - -If you roll out everything at once, queriers with this new code will attempt to query ingesters which may not have the new method on the API and queries will fail. - -This will only affect reads(queries) and not writes and only for the duration of the rollout. - -### IMPORTANT: Scrape config changes to both Helm and Ksonnet will affect labels created by Promtail - -PR [2091](https://github.com/grafana/loki/pull/2091) Makes several changes to the promtail scrape config: - -```` -This is triggered by https://github.com/grafana/jsonnet-libs/pull/261 - -The above PR changes the instance label to be actually unique within -a scrape config. It also adds a pod and a container target label -so that metrics can easily be joined with metrics from cAdvisor, KSM, -and the Kubelet. - -This commit adds the same to the Loki scrape config. It also removes -the container_name label. It is the same as the container label -and was already added to Loki previously. However, the -container_name label is deprecated and has disappeared in K8s 1.16, -so that it will soon become useless for direct joining. -```` - -TL;DR - -The following label have been changed in both the Helm and Ksonnet Promtail scrape configs: - -`instance` -> `pod` -`container_name` -> `container` - - -### Experimental boltdb-shipper changes - -PR [2166](https://github.com/grafana/loki/pull/2166) now forces the index to have a period of exactly `24h`: - -Loki will fail to start with an error if the active schema or upcoming schema are not set to a period of `24h` - -You can add a new schema config like this: - -```yaml -schema_config: - configs: - - from: 2020-01-01 <----- This is your current entry, date will be different - store: boltdb-shipper - object_store: aws - schema: v11 - index: - prefix: index_ - period: 168h - - from: [INSERT FUTURE DATE HERE] <----- Add another entry, set a future date - store: boltdb-shipper - object_store: aws - schema: v11 - index: - prefix: index_ - period: 24h <--- This must be 24h -``` -If you are not on `schema: v11` this would be a good oportunity to make that change _in the new schema config_ also. - -**NOTE** If the current time in your timezone is after midnight UTC already, set the date one additional day forward. - -There was also a significant overhaul to how boltdb-shipper internals, this should not be visible to a user but as this -feature is experimental and under development bug are possible! - -The most noticeable change if you look in the storage, Loki no longer updates an existing file and instead creates a -new index file every 15mins, this is an important move to make sure objects in the object store are immutable and -will simplify future operations like compaction and deletion. - -### Breaking CLI flags changes - -The following CLI flags where changed to improve consistency, they are not expected to be widely used - -```diff -- querier.query_timeout -+ querier.query-timeout - -- distributor.extra-query-delay -+ querier.extra-query-delay - -- max-chunk-batch-size -+ store.max-chunk-batch-size - -- ingester.concurrent-flushed -+ ingester.concurrent-flushes -``` - -### Loki Canary metric name changes - -When adding some new features to the canary we realized the existing metrics were not compliant with standards for counter names, the following metrics have been renamed: - -```nohighlight -loki_canary_total_entries -> loki_canary_entries_total -loki_canary_out_of_order_entries -> loki_canary_out_of_order_entries_total -loki_canary_websocket_missing_entries -> loki_canary_websocket_missing_entries_total -loki_canary_missing_entries -> loki_canary_missing_entries_total -loki_canary_unexpected_entries -> loki_canary_unexpected_entries_total -loki_canary_duplicate_entries -> loki_canary_duplicate_entries_total -loki_canary_ws_reconnects -> loki_canary_ws_reconnects_total -loki_canary_response_latency -> loki_canary_response_latency_seconds -``` - -### Ksonnet Changes - -In `production/ksonnet/loki/config.libsonnet` the variable `storage_backend` used to have a default value of `'bigtable,gcs'`. -This has been changed to providing no default and will error if not supplied in your environment jsonnet, -here is an example of what you should add to have the same behavior as the default (namespace and cluster should already be defined): - -```jsonnet -_config+:: { - namespace: 'loki-dev', - cluster: 'us-central1', - storage_backend: 'gcs,bigtable', -``` - -Defaulting to `gcs,bigtable` was confusing for anyone using ksonnet with other storage backends as it would manifest itself with obscure bigtable errors. +[1.6.0 Upgrade Notes]({{< relref "../upgrading/_index.md#160" >}}) ## 1.5.0 -Note: The required upgrade path outlined for version 1.4.0 below is still true for moving to 1.5.0 from any release older than 1.4.0 (e.g. 1.3.0->1.5.0 needs to also look at the 1.4.0 upgrade requirements). - -### Breaking config changes! - -Loki 1.5.0 vendors Cortex v1.0.0 (congratulations!), which has a [massive list of changes](https://cortexmetrics.io/docs/changelog/#1-0-0-2020-04-02). - -While changes in the command line flags affect Loki as well, we usually recommend people to use configuration file instead. - -Cortex has done lot of cleanup in the configuration files, and you are strongly urged to take a look at the [annotated diff for config file](https://cortexmetrics.io/docs/changelog/#config-file-breaking-changes) before upgrading to Loki 1.5.0. - -Following fields were removed from YAML configuration completely: `claim_on_rollout` (always true), `normalise_tokens` (always true). - -#### Test Your Config - -To see if your config needs to change, one way to quickly test is to download a 1.5.0 (or newer) binary from the [release page](https://github.com/grafana/loki/releases/tag/v1.5.0) - -Then run the binary providing your config file `./loki-linux-amd64 -config.file=myconfig.yaml` - -If there are configs which are no longer valid you will see errors immediately: - -```shell -./loki-linux-amd64 -config.file=loki-local-config.yaml -failed parsing config: loki-local-config.yaml: yaml: unmarshal errors: - line 35: field dynamodbconfig not found in type aws.StorageConfig -``` - -Referencing the [list of diffs](https://cortexmetrics.io/docs/changelog/#config-file-breaking-changes) I can see this config changed: - -```diff -- dynamodbconfig: -+ dynamodb: -``` - -Also several other AWS related configs changed and would need to udpate those as well. - - -### Loki Docker Image User and File Location Changes - -To improve security concerns, in 1.5.0 the Docker container no longer runs the loki process as `root` and instead the process runs as user `loki` with UID `10001` and GID `10001` - -This may affect people in a couple ways: - -#### Loki Port - -If you are running Loki with a config that opens a port number above 1024 (which is the default, 3100 for HTTP and 9095 for GRPC) everything should work fine in regards to ports. - -If you are running Loki with a config that opens a port number less than 1024 Linux normally requires root permissions to do this, HOWEVER in the Docker container we run `setcap cap_net_bind_service=+ep /usr/bin/loki` - -This capability lets the loki process bind to a port less than 1024 when run as a non root user. - -Not every environment will allow this capability however, it's possible to restrict this capability in linux. If this restriction is in place, you will be forced to run Loki with a config that has HTTP and GRPC ports above 1024. - -#### Filesystem - -**Please note the location Loki is looking for files with the provided config in the docker image has changed** - -In 1.4.0 and earlier the included config file in the docker container was using directories: - -``` -/tmp/loki/index -/tmp/loki/chunks -``` - -In 1.5.0 this has changed: - -``` -/loki/index -/loki/chunks -``` - -This will mostly affect anyone using docker-compose or docker to run Loki and are specifying a volume to persist storage. - -**There are two concerns to track here, one is the correct ownership of the files and the other is making sure your mounts updated to the new location.** - -One possible upgrade path would look like this: - -If I were running Loki with this command `docker run -d --name=loki --mount source=loki-data,target=/tmp/loki -p 3100:3100 grafana/loki:1.4.0` - -This would mount a docker volume named `loki-data` to the `/tmp/loki` folder which is where Loki will persist the `index` and `chunks` folder in 1.4.0 - -To move to 1.5.0 I can do the following (please note that your container names and paths and volumes etc may be different): - -``` -docker stop loki -docker rm loki -docker run --rm --name="loki-perm" -it --mount source=loki-data,target=/mnt ubuntu /bin/bash -cd /mnt -chown -R 10001:10001 ./* -exit -docker run -d --name=loki --mount source=loki-data,target=/loki -p 3100:3100 grafana/loki:1.5.0 -``` - -Notice the change in the `target=/loki` for 1.5.0 to the new data directory location specified in the [included Loki config file](https://github.com/grafana/loki/tree/master/cmd/loki/loki-docker-config.yaml). - -The intermediate step of using an ubuntu image to change the ownership of the Loki files to the new user might not be necessary if you can easily access these files to run the `chown` command directly. -That is if you have access to `/var/lib/docker/volumes` or if you mounted to a different local filesystem directory, you can change the ownership directly without using a container. - - -### Loki Duration Configs - -If you get an error like: - -```nohighlight - ./loki-linux-amd64-1.5.0 -log.level=debug -config.file=/etc/loki/config.yml -failed parsing config: /etc/loki/config.yml: not a valid duration string: "0" -``` - -This is because of some underlying changes that no longer allow durations without a unit. - -Unfortunately the yaml parser doesn't give a line number but it's likely to be one of these two: - -```yaml -chunk_store_config: - max_look_back_period: 0s # DURATION VALUES MUST HAVE A UNIT EVEN IF THEY ARE ZERO - -table_manager: - retention_deletes_enabled: false - retention_period: 0s # DURATION VALUES MUST HAVE A UNIT EVEN IF THEY ARE ZERO -``` - -### Promtail Config Changes - -The underlying backoff library used in promtail had a config change which wasn't originally noted in the release notes: - -If you get this error: - -```nohighlight -Unable to parse config: /etc/promtail/promtail.yaml: yaml: unmarshal errors: - line 3: field maxbackoff not found in type util.BackoffConfig - line 4: field maxretries not found in type util.BackoffConfig - line 5: field minbackoff not found in type util.BackoffConfig -``` - -The new values are: - -```yaml -min_period: -max_period: -max_retries: -``` +[1.5.0 Upgrade Notes]({{< relref "../upgrading/_index.md#150" >}}) ## 1.4.0 -Loki 1.4.0 vendors Cortex v0.7.0-rc.0 which contains [several breaking config changes](https://github.com/cortexproject/cortex/blob/v0.7.0-rc.0/CHANGELOG). - -One such config change which will affect Loki users: - -In the [cache_config](../../configuration#cache_config): - -`defaul_validity` has changed to `default_validity` - -Also in the unlikely case you were configuring your schema via arguments and not a config file, this is no longer supported. This is not something we had ever provided as an option via docs and is unlikely anyone is doing, but worth mentioning. - -The other config changes should not be relevant to Loki. - -### Required Upgrade Path - -The newly vendored version of Cortex removes code related to de-normalized tokens in the ring. What you need to know is this: - -*Note:* A "shared ring" as mentioned below refers to using *consul* or *etcd* for values in the following config: - -```yaml -kvstore: - # The backend storage to use for the ring. Supported values are - # consul, etcd, inmemory - store: -``` - -- Running without using a shared ring (inmemory): No action required -- Running with a shared ring and upgrading from v1.3.0 -> v1.4.0: No action required -- Running with a shared ring and upgrading from any version less than v1.3.0 (e.g. v1.2.0) -> v1.4.0: **ACTION REQUIRED** - -There are two options for upgrade if you are not on version 1.3.0 and are using a shared ring: - -- Upgrade first to v1.3.0 **BEFORE** upgrading to v1.4.0 - -OR - -**Note:** If you are running a single binary you only need to add this flag to your single binary command. - -1. Add the following configuration to your ingesters command: `-ingester.normalise-tokens=true` -1. Restart your ingesters with this config -1. Proceed with upgrading to v1.4.0 -1. Remove the config option (only do this after everything is running v1.4.0) - -**Note:** It's also possible to enable this flag via config file, see the [`lifecycler_config`](https://github.com/grafana/loki/tree/v1.3.0/docs/configuration#lifecycler_config) configuration option. - -If using the Helm Loki chart: - -```yaml -extraArgs: - ingester.normalise-tokens: true -``` - -If using the Helm Loki-Stack chart: - -```yaml -loki: - extraArgs: - ingester.normalise-tokens: true -``` - -#### What will go wrong - -If you attempt to add a v1.4.0 ingester to a ring created by Loki v1.2.0 or older which does not have the commandline argument `-ingester.normalise-tokens=true` (or configured via [config file](https://github.com/grafana/loki/tree/v1.3.0/docs/configuration#lifecycler_config)), the v1.4.0 ingester will remove all the entries in the ring for all the other ingesters as it cannot "see" them. - -This will result in distributors failing to write and a general ingestion failure for the system. - -If this happens to you, you will want to rollback your deployment immediately. You need to remove the v1.4.0 ingester from the ring ASAP, this should allow the existing ingesters to re-insert their tokens. You will also want to remove any v1.4.0 distributors as they will not understand the old ring either and will fail to send traffic. +[1.4.0 Upgrade Notes]({{< relref "../upgrading/_index.md#140" >}}) \ No newline at end of file diff --git a/docs/sources/storage/_index.md b/docs/sources/storage/_index.md index b6384b18875e2..2675a8a28cabb 100644 --- a/docs/sources/storage/_index.md +++ b/docs/sources/storage/_index.md @@ -4,7 +4,19 @@ weight: 1010 --- # Storage -Loki uses a two pronged strategy regarding storage, which is responsible for both it's limitations and it's advantages. The main idea is that logs are large and traditional indexing strategies are prohibitively expensive and complex to run at scale. This often brings along ancillary procedure costs in the form of schema design, index management/rotation, backup/restore protocols, etc. Instead, Loki stores all the its log content unindexed in object storage. It then uses the Prometheus label paradigm along with a small but specialized index store to allow lookup, matching, and filtering based on the these labels. When a set of unique key/value label pairs are combined with their logs, we call this a _log stream_, which is generally analagous to a log file on disk. It may have labels like `{app="api", env="production", filename="/var/logs/app.log"}`, which together uniqely identify it. The object storage is responsible for storing the compressed logs cheaply while the index takes care of storing these labels in a way that enables fast, effective querying. +Unlike other logging systems, Loki is built around the idea of only indexing +metadata about your logs: labels (just like Prometheus labels). Log data itself +is then compressed and stored in chunks in object stores such as S3 or GCS, or +even locally on the filesystem. A small index and highly compressed chunks +simplifies the operation and significantly lowers the cost of Loki. + +Until Loki 2.0, index data was stored in a separate index. + +Loki 2.0 brings an index mechanism named 'boltdb-shipper' and is what we now call Single Store Loki. +This index type only requires one store, the object store, for both the index and chunks. +More detailed information can be found on the [operations page]({{< relref "../operations/storage/boltdb-shipper.md" >}}). + +Some more storage details can also be found in the [operations section]({{< relref "../operations/storage/_index.md" >}}). - [Storage](#storage) - [Implementations - Chunks](#implementations---chunks) @@ -14,12 +26,13 @@ Loki uses a two pronged strategy regarding storage, which is responsible for bot - [S3](#s3) - [Notable Mentions](#notable-mentions) - [Implementations - Index](#implementations---index) + - [Single Store (boltdb-shipper) - Recommended for 2.0 and newer](#single-store) - [Cassandra](#cassandra-1) - [BigTable](#bigtable) - [DynamoDB](#dynamodb) - [Rate Limiting](#rate-limiting) - [BoltDB](#boltdb) - - [Period Configs](#period-configs) + - [Schema Configs](#schema-configs) - [Table Manager](#table-manager) - [Provisioning](#provisioning) - [Upgrading Schemas](#upgrading-schemas) @@ -51,10 +64,16 @@ S3 is AWS's hosted object store. It is a good candidate for a managed object sto ### Notable Mentions -You may use any subsitutable services, such as those that implement the S3 API like [MinIO](https://min.io/). +You may use any substitutable services, such as those that implement the S3 API like [MinIO](https://min.io/). ## Implementations - Index +### Single-Store + +Also known as "boltdb-shipper" during development (and is still the schema `store` name). The single store configurations for Loki utilize the chunk store for both chunks and the index, requiring just one store to run Loki. + +As of 2.0, this is the recommended index storage type, performance is comparable to a dedicated index type while providing a much less expensive and less complicated deployment. + ### Cassandra Cassandra can also be utilized for the index store and aside from the experimental [boltdb-shipper](../operations/storage/boltdb-shipper/), it's the only non-cloud offering that can be used for the index that's horizontally scalable and has configurable replication. It's a good candidate when you already run Cassandra, are running on-prem, or do not wish to use a managed cloud offering. @@ -75,9 +94,9 @@ DynamoDB is susceptible to rate limiting, particularly due to overconsuming what BoltDB is an embedded database on disk. It is not replicated and thus cannot be used for high availability or clustered Loki deployments, but is commonly paired with a `filesystem` chunk store for proof of concept deployments, trying out Loki, and development. There is also an experimental mode, the [boltdb-shipper](../operations/storage/boltdb-shipper/), which aims to support clustered deployments using `boltdb` as an index. -## Period Configs +## Schema Configs -Loki aims to be backwards compatible and over the course of it's development has had many internal changes that facilitate better and more efficient storage/querying. Loki allows incrementally upgrading to these new storage _schemas_ and can query across them transparently. This makes upgrading a breeze. For instance, this is what it looks like when migrating from the v10 -> v11 schemas starting 2020-07-01: +Loki aims to be backwards compatible and over the course of its development has had many internal changes that facilitate better and more efficient storage/querying. Loki allows incrementally upgrading to these new storage _schemas_ and can query across them transparently. This makes upgrading a breeze. For instance, this is what it looks like when migrating from the v10 -> v11 schemas starting 2020-07-01: ```yaml schema_config: @@ -175,46 +194,32 @@ For more information, see the [retention configuration](../operations/storage/re ### Single machine/local development (boltdb+filesystem) -```yaml -storage_config: - boltdb: - directory: /tmp/loki/index - filesystem: - directory: /tmp/loki/chunks - -schema_config: - configs: - - from: 2020-07-01 - store: boltdb - object_store: filesystem - schema: v11 - index: - prefix: index_ - period: 168h -``` +[The repo contains a working example](https://github.com/grafana/loki/blob/master/cmd/loki/loki-local-config.yaml), you may want to checkout a tag of the repo to make sure you get a compatible example. -### GCP deployment (GCS+BigTable) +### GCP deployment (GCS Single Store) ```yaml storage_config: - bigtable: - instance: - project: + boltdb_shipper: + active_index_directory: /loki/boltdb-shipper-active + cache_location: /loki/boltdb-shipper-cache + cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space + shared_store: gcs gcs: bucket_name: schema_config: configs: - from: 2020-07-01 - store: bigtable + store: boltdb-shipper object_store: gcs schema: v11 index: prefix: index_ - period: 168h + period: 24h ``` -### AWS deployment (S3+DynamoDB) +### AWS deployment (S3 Single Store) ```yaml storage_config: @@ -232,7 +237,7 @@ schema_config: schema: v11 index: prefix: index_ - period: 168h + period: 24h ``` If you don't wish to hard-code S3 credentials, you can also configure an EC2 @@ -249,6 +254,8 @@ storage_config: ### On prem deployment (Cassandra+Cassandra) +**Keeping this for posterity, but this is likely not a common config. Cassandra should work and could be faster in some situations but is likely much more expensive.** + ```yaml storage_config: cassandra: @@ -273,7 +280,7 @@ schema_config: ``` -### On prem deployment (Cassandra+MinIO) +### On prem deployment (MinIO Single Store) We configure MinIO by using the AWS config because MinIO implements the S3 API: @@ -284,20 +291,19 @@ storage_config: # full example: http://loki:supersecret@localhost.:9000 s3: http://:@: s3forcepathstyle: true - cassandra: - addresses: - keyspace: - auth: - username: # only applicable when auth=true - password: # only applicable when auth=true + boltdb_shipper: + active_index_directory: /loki/boltdb-shipper-active + cache_location: /loki/boltdb-shipper-cache + cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space + shared_store: s3 schema_config: configs: - from: 2020-07-01 - store: cassandra + store: boltdb-shipper object_store: aws schema: v11 index: prefix: index_ - period: 168h + period: 24h ``` diff --git a/docs/sources/upgrading/_index.md b/docs/sources/upgrading/_index.md new file mode 100644 index 0000000000000..d9d2bf19a54ad --- /dev/null +++ b/docs/sources/upgrading/_index.md @@ -0,0 +1,559 @@ +--- +title: Upgrading +weight: 250 +--- + +# Upgrading Loki + +Every attempt is made to keep Loki backwards compatible, such that upgrades should be low risk and low friction. + +Unfortunately Loki is software and software is hard and sometimes we are forced to make decisions between ease of use and ease of maintenance. + +If we have any expectation of difficulty upgrading we will document it here. + +As more versions are released it becomes more likely unexpected problems arise moving between multiple versions at once. +If possible try to stay current and do sequential updates. If you want to skip versions, try it in a development environment before attempting to upgrade production. + + +## Master / Unreleased + +_add changes here which are unreleased_ + +## 2.0.0 + +This is a major Loki release and there are some very important upgrade considerations. +For the most part, there are very few impactful changes and for most this will be a seamless upgrade. + +2.0.0 Upgrade Topics: + +* [IMPORTANT If you are using a docker image, read this!](#important-if-you-are-using-a-docker-image-read-this) +* [IMPORTANT boltdb-shipper upgrade considerations](#important-boltdb-shipper-upgrade-considerations) +* [IMPORTANT results_cachemax_freshness removed from yaml config](#important-results_cachemax_freshness-removed-from-yaml-config) +* [Promtail removed entry_parser config](#promtail-config-removed) +* [If you would like to use the new single store index and v11 schema](#upgrading-schema-to-use-boltdb-shipper-andor-v11-schema) + +### **IMPORTANT If you are using a docker image, read this!** + +(This includes, Helm, Tanka, docker-compose etc.) + +The default config file in the docker image, as well as the default helm values.yaml and jsonnet for Tanka all specify a schema definition to make things easier to get started. + +>**If you have not specified your own config file with your own schema definition (or you do not have a custom schema definition in your values.yaml), upgrading to 2.0 will break things!** + +In 2.0 the defaults are now v11 schema and the `boltdb-shipper` index type. + + +If you are using an index type of `aws`, `bigtable`, or `cassandra` this means you have already defined a custom schema and there is _nothing_ further you need to do regarding the schema. +You can consider however adding a new schema entry to use the new `boltdb-shipper` type if you want to move away from these separate index stores and instead use just one object store. + +#### What to do + +The minimum action required is to create a config which specifies the schema to match what the previous defaults were. + +(Keep in mind this will only tell Loki to use the old schema default, if you would like to upgrade to v11 and/or move to the single store boltdb-shipper, [see below](#upgrading-schema-to-use-boltdb-shipper-andor-v11-schema)) + +There are three places we have hard coded the schema definition: + +##### Helm + +Helm has shipped with the same internal schema in the values.yaml file for a very long time. + +If you are providing your own values.yaml file then there is no _required_ action because you will already have a fixed schema version. + +**If you are not providing your own values.yaml file, you will need to make one! and at a minimum it will need this config:** + +```yaml +schema_config: + configs: + - from: 2018-04-15 + store: boltdb + object_store: filesystem + schema: v9 + index: + prefix: index_ + period: 168h +``` + +This matches what the default values.yaml file had prior to 2.0 and is necessary for Loki to work post 2.0 + +##### Tanka + +This likely only affects a small portion of tanka users because the default schema config for Loki was forcing `GCS` and `bigtable`. + +**If your `main.jsonnet` (or somewhere in your manually created jsonnet) does not have a schema config section then you will need to add one like this!** + +```jsonnet +{ + _config+:: { + using_boltdb_shipper: false, + loki+: { + schema_config+: { + configs: [{ + from: '2018-04-15', + store: 'bigtable', + object_store: 'gcs', + schema: 'v11', + index: { + prefix: '%s_index_' % $._config.table_prefix, + period: '168h', + }, + }], + }, + }, + } +} +``` + +>**NOTE** If you had set `index_period_hours` to a value other than 168h (the previous default) you must update this in the above config `period:` to match what you chose. + +>**NOTE** We have changed the default index store to `boltdb-shipper` it's important to add `using_boltdb_shipper: false,` until you are ready to change (if you want to change) + +Changing the jsonnet config to use the `boltdb-shipper` type is the same as [below](#upgrading-schema-to-use-boltdb-shipper-andor-v11-schema) where you need to add a new schema section. + +**HOWEVER** Be aware when you change `using_boltdb_shipper: true` the deployment type for the ingesters and queriers will change to statefulsets! Statefulsets are required for the ingester and querier using boltdb-shipper. + +##### Docker (e.g. docker-compose) + +For docker related cases you will have to mount a Loki config file separate from what's shipped inside the container + +I would recommend taking the previous default file from the [1.6.0 tag on github](https://raw.githubusercontent.com/grafana/loki/v1.6.0/cmd/loki/loki-docker-config.yaml) + +How you get this mounted and in use by Loki might vary based on how you are using the image, but this is a common example: + +```shell +docker run -d --name=loki --mount type=bind,source="path to loki-config.yaml",target=/etc/loki/local-config.yaml +``` + +The Loki docker image is expecting to find the config file at `/etc/loki/local-config.yaml` + + +### IMPORTANT: boltdb-shipper upgrade considerations. + +Significant changes have taken place between 1.6.0 and 2.0.0 for boltdb-shipper index type, if you are already running this index and are upgrading some extra caution is warranted. + +Please strongly consider taking a complete backup of the `index` directory in your object store, this location might be slightly different depending on what store you use. +It should be a folder named index with a bunch of folders inside with names like `index_18561`,`index_18560`... + +The chunks directory should not need any special backups. + +If you have an environment to test this in please do so before upgrading against critical data. + +There are 2 significant changes warranting the backup of this data because they will make rolling back impossible: +* A compactor is included which will take existing index files and compact them to one per day and remove non compacted files +* All index files are now gzipped before uploading + +The second part is important because 1.6.0 does not understand how to read the gzipped files, so any new files uploaded or any files compacted become unreadable to 1.6.0 or ealier. + +_THIS BEING SAID_ we are not expecting problems, our testing so far has not uncovered any problems, but some extra precaution might save data loss in unforeseen circumstances! + +Please report any problems via GitHub issues or reach us on the #loki slack channel. + +**Note if are using boltdb-shipper and were running with high availability and separate filesystems** + +This was a poorly documented and even more experimental mode we toyed with using boltdb-shipper. For now we removed the documentation and also any kind of support for this mode. + +To use boltdb-shipper in 2.0 you need a shared storage (S3, GCS, etc), the mode of running with separate filesystem stores in HA using a ring is not officially supported. + +We didn't do anything explicitly to limit this functionality however we have not had any time to actually test this which is why we removed the docs and are listing it as not supported. + +#### If running in microservices, deploy ingesters before queriers + +Ingesters now expose a new RPC method that queriers use when the index type is `boltdb-shipper`. +Queriers generally roll out faster than ingesters, so if new queriers query older ingesters using the new RPC, the queries would fail. +To avoid any query downtime during the upgrade, rollout ingesters before queriers. + +### IMPORTANT: `results_cache.max_freshness` removed from YAML config + +The `max_freshness` config from `results_cache` has been removed in favour of another flag called `max_cache_freshness_per_query` in `limits_config` which has the same effect. +If you happen to have `results_cache.max_freshness` set please use `limits_config.max_cache_freshness_per_query` YAML config instead. + +### Promtail config removed + +The long deprecated `entry_parser` config in Promtail has been removed, use [pipeline_stages]({{< relref "../clients/promtail/configuration/#pipeline_stages" >}}) instead. + +### Upgrading schema to use boltdb-shipper and/or v11 schema + +If you would also like to take advantage of the new Single Store (boltdb-shipper) index, as well as the v11 schema if you aren't already using it. + +You can do this by adding a new schema entry. + +Here is an example: + +```yaml +schema_config: + configs: + - from: 2018-04-15 ① + store: boltdb ①④ + object_store: filesystem ①④ + schema: v11 ② + index: + prefix: index_ ① + period: 168h ① + - from: 2020-10-24 ③ + store: boltdb-shipper + object_store: filesystem ④ + schema: v11 + index: + prefix: index_ + period: 24h ⑤ +``` +① Make sure all of these match your current schema config +② Make sure this matches your previous schema version, Helm for example is likely v9 +③ Make sure this is a date in the **FUTURE** keep in mind Loki only knows UTC so make sure it's a future UTC date +④ Make sure this matches your existing config (e.g. maybe you were using gcs for your object_store) +⑤ 24h is required for boltdb-shipper + +There are more examples on the [Storage description page]({{< relref "../storage/_index.md#examples" >}}) + + +## 1.6.0 + +### Important: Ksonnet port changed and removed NET_BIND_SERVICE capability from Docker image + +In 1.5.0 we changed the Loki user to not run as root which created problems binding to port 80. +To address this we updated the docker image to add the NET_BIND_SERVICE capability to the loki process +which allowed Loki to bind to port 80 as a non root user, so long as the underlying system allowed that +linux capability. + +This has proved to be a problem for many reasons and in PR [2294](https://github.com/grafana/loki/pull/2294/files) +the capability was removed. + +It is now no longer possible for the Loki to be started with a port less than 1024 with the published docker image. + +The default for Helm has always been port 3100, and Helm users should be unaffect unless they changed the default. + +**Ksonnet users however should closely check their configuration, in PR 2294 the loki port was changed from 80 to 3100** + + +### IMPORTANT: If you run Loki in microservices mode, special rollout instructions + +A new ingester GRPC API has been added allowing to speed up metric queries, to ensure a rollout without query errors **make sure you upgrade all ingesters first.** +Once this is done you can then proceed with the rest of the deployment, this is to ensure that queriers won't look for an API not yet available. + +If you roll out everything at once, queriers with this new code will attempt to query ingesters which may not have the new method on the API and queries will fail. + +This will only affect reads(queries) and not writes and only for the duration of the rollout. + +### IMPORTANT: Scrape config changes to both Helm and Ksonnet will affect labels created by Promtail + +PR [2091](https://github.com/grafana/loki/pull/2091) Makes several changes to the promtail scrape config: + +```` +This is triggered by https://github.com/grafana/jsonnet-libs/pull/261 + +The above PR changes the instance label to be actually unique within +a scrape config. It also adds a pod and a container target label +so that metrics can easily be joined with metrics from cAdvisor, KSM, +and the Kubelet. + +This commit adds the same to the Loki scrape config. It also removes +the container_name label. It is the same as the container label +and was already added to Loki previously. However, the +container_name label is deprecated and has disappeared in K8s 1.16, +so that it will soon become useless for direct joining. +```` + +TL;DR + +The following label have been changed in both the Helm and Ksonnet Promtail scrape configs: + +`instance` -> `pod` +`container_name` -> `container` + + +### Experimental boltdb-shipper changes + +PR [2166](https://github.com/grafana/loki/pull/2166) now forces the index to have a period of exactly `24h`: + +Loki will fail to start with an error if the active schema or upcoming schema are not set to a period of `24h` + +You can add a new schema config like this: + +```yaml +schema_config: + configs: + - from: 2020-01-01 <----- This is your current entry, date will be different + store: boltdb-shipper + object_store: aws + schema: v11 + index: + prefix: index_ + period: 168h + - from: [INSERT FUTURE DATE HERE] <----- Add another entry, set a future date + store: boltdb-shipper + object_store: aws + schema: v11 + index: + prefix: index_ + period: 24h <--- This must be 24h +``` +If you are not on `schema: v11` this would be a good oportunity to make that change _in the new schema config_ also. + +**NOTE** If the current time in your timezone is after midnight UTC already, set the date one additional day forward. + +There was also a significant overhaul to how boltdb-shipper internals, this should not be visible to a user but as this +feature is experimental and under development bug are possible! + +The most noticeable change if you look in the storage, Loki no longer updates an existing file and instead creates a +new index file every 15mins, this is an important move to make sure objects in the object store are immutable and +will simplify future operations like compaction and deletion. + +### Breaking CLI flags changes + +The following CLI flags where changed to improve consistency, they are not expected to be widely used + +```diff +- querier.query_timeout ++ querier.query-timeout + +- distributor.extra-query-delay ++ querier.extra-query-delay + +- max-chunk-batch-size ++ store.max-chunk-batch-size + +- ingester.concurrent-flushed ++ ingester.concurrent-flushes +``` + +### Loki Canary metric name changes + +When adding some new features to the canary we realized the existing metrics were not compliant with standards for counter names, the following metrics have been renamed: + +```nohighlight +loki_canary_total_entries -> loki_canary_entries_total +loki_canary_out_of_order_entries -> loki_canary_out_of_order_entries_total +loki_canary_websocket_missing_entries -> loki_canary_websocket_missing_entries_total +loki_canary_missing_entries -> loki_canary_missing_entries_total +loki_canary_unexpected_entries -> loki_canary_unexpected_entries_total +loki_canary_duplicate_entries -> loki_canary_duplicate_entries_total +loki_canary_ws_reconnects -> loki_canary_ws_reconnects_total +loki_canary_response_latency -> loki_canary_response_latency_seconds +``` + +### Ksonnet Changes + +In `production/ksonnet/loki/config.libsonnet` the variable `storage_backend` used to have a default value of `'bigtable,gcs'`. +This has been changed to providing no default and will error if not supplied in your environment jsonnet, +here is an example of what you should add to have the same behavior as the default (namespace and cluster should already be defined): + +```jsonnet +_config+:: { + namespace: 'loki-dev', + cluster: 'us-central1', + storage_backend: 'gcs,bigtable', +``` + +Defaulting to `gcs,bigtable` was confusing for anyone using ksonnet with other storage backends as it would manifest itself with obscure bigtable errors. + +## 1.5.0 + +Note: The required upgrade path outlined for version 1.4.0 below is still true for moving to 1.5.0 from any release older than 1.4.0 (e.g. 1.3.0->1.5.0 needs to also look at the 1.4.0 upgrade requirements). + +### Breaking config changes! + +Loki 1.5.0 vendors Cortex v1.0.0 (congratulations!), which has a [massive list of changes](https://cortexmetrics.io/docs/changelog/#1-0-0-2020-04-02). + +While changes in the command line flags affect Loki as well, we usually recommend people to use configuration file instead. + +Cortex has done lot of cleanup in the configuration files, and you are strongly urged to take a look at the [annotated diff for config file](https://cortexmetrics.io/docs/changelog/#config-file-breaking-changes) before upgrading to Loki 1.5.0. + +Following fields were removed from YAML configuration completely: `claim_on_rollout` (always true), `normalise_tokens` (always true). + +#### Test Your Config + +To see if your config needs to change, one way to quickly test is to download a 1.5.0 (or newer) binary from the [release page](https://github.com/grafana/loki/releases/tag/v1.5.0) + +Then run the binary providing your config file `./loki-linux-amd64 -config.file=myconfig.yaml` + +If there are configs which are no longer valid you will see errors immediately: + +```shell +./loki-linux-amd64 -config.file=loki-local-config.yaml +failed parsing config: loki-local-config.yaml: yaml: unmarshal errors: + line 35: field dynamodbconfig not found in type aws.StorageConfig +``` + +Referencing the [list of diffs](https://cortexmetrics.io/docs/changelog/#config-file-breaking-changes) I can see this config changed: + +```diff +- dynamodbconfig: ++ dynamodb: +``` + +Also several other AWS related configs changed and would need to udpate those as well. + + +### Loki Docker Image User and File Location Changes + +To improve security concerns, in 1.5.0 the Docker container no longer runs the loki process as `root` and instead the process runs as user `loki` with UID `10001` and GID `10001` + +This may affect people in a couple ways: + +#### Loki Port + +If you are running Loki with a config that opens a port number above 1024 (which is the default, 3100 for HTTP and 9095 for GRPC) everything should work fine in regards to ports. + +If you are running Loki with a config that opens a port number less than 1024 Linux normally requires root permissions to do this, HOWEVER in the Docker container we run `setcap cap_net_bind_service=+ep /usr/bin/loki` + +This capability lets the loki process bind to a port less than 1024 when run as a non root user. + +Not every environment will allow this capability however, it's possible to restrict this capability in linux. If this restriction is in place, you will be forced to run Loki with a config that has HTTP and GRPC ports above 1024. + +#### Filesystem + +**Please note the location Loki is looking for files with the provided config in the docker image has changed** + +In 1.4.0 and earlier the included config file in the docker container was using directories: + +``` +/tmp/loki/index +/tmp/loki/chunks +``` + +In 1.5.0 this has changed: + +``` +/loki/index +/loki/chunks +``` + +This will mostly affect anyone using docker-compose or docker to run Loki and are specifying a volume to persist storage. + +**There are two concerns to track here, one is the correct ownership of the files and the other is making sure your mounts updated to the new location.** + +One possible upgrade path would look like this: + +If I were running Loki with this command `docker run -d --name=loki --mount source=loki-data,target=/tmp/loki -p 3100:3100 grafana/loki:1.4.0` + +This would mount a docker volume named `loki-data` to the `/tmp/loki` folder which is where Loki will persist the `index` and `chunks` folder in 1.4.0 + +To move to 1.5.0 I can do the following (please note that your container names and paths and volumes etc may be different): + +``` +docker stop loki +docker rm loki +docker run --rm --name="loki-perm" -it --mount source=loki-data,target=/mnt ubuntu /bin/bash +cd /mnt +chown -R 10001:10001 ./* +exit +docker run -d --name=loki --mount source=loki-data,target=/loki -p 3100:3100 grafana/loki:1.5.0 +``` + +Notice the change in the `target=/loki` for 1.5.0 to the new data directory location specified in the [included Loki config file](https://github.com/grafana/loki/tree/master/cmd/loki/loki-docker-config.yaml). + +The intermediate step of using an ubuntu image to change the ownership of the Loki files to the new user might not be necessary if you can easily access these files to run the `chown` command directly. +That is if you have access to `/var/lib/docker/volumes` or if you mounted to a different local filesystem directory, you can change the ownership directly without using a container. + + +### Loki Duration Configs + +If you get an error like: + +```nohighlight + ./loki-linux-amd64-1.5.0 -log.level=debug -config.file=/etc/loki/config.yml +failed parsing config: /etc/loki/config.yml: not a valid duration string: "0" +``` + +This is because of some underlying changes that no longer allow durations without a unit. + +Unfortunately the yaml parser doesn't give a line number but it's likely to be one of these two: + +```yaml +chunk_store_config: + max_look_back_period: 0s # DURATION VALUES MUST HAVE A UNIT EVEN IF THEY ARE ZERO + +table_manager: + retention_deletes_enabled: false + retention_period: 0s # DURATION VALUES MUST HAVE A UNIT EVEN IF THEY ARE ZERO +``` + +### Promtail Config Changes + +The underlying backoff library used in promtail had a config change which wasn't originally noted in the release notes: + +If you get this error: + +```nohighlight +Unable to parse config: /etc/promtail/promtail.yaml: yaml: unmarshal errors: + line 3: field maxbackoff not found in type util.BackoffConfig + line 4: field maxretries not found in type util.BackoffConfig + line 5: field minbackoff not found in type util.BackoffConfig +``` + +The new values are: + +```yaml +min_period: +max_period: +max_retries: +``` + +## 1.4.0 + +Loki 1.4.0 vendors Cortex v0.7.0-rc.0 which contains [several breaking config changes](https://github.com/cortexproject/cortex/blob/v0.7.0-rc.0/CHANGELOG). + +One such config change which will affect Loki users: + +In the [cache_config](../../configuration#cache_config): + +`defaul_validity` has changed to `default_validity` + +Also in the unlikely case you were configuring your schema via arguments and not a config file, this is no longer supported. This is not something we had ever provided as an option via docs and is unlikely anyone is doing, but worth mentioning. + +The other config changes should not be relevant to Loki. + +### Required Upgrade Path + +The newly vendored version of Cortex removes code related to de-normalized tokens in the ring. What you need to know is this: + +*Note:* A "shared ring" as mentioned below refers to using *consul* or *etcd* for values in the following config: + +```yaml +kvstore: + # The backend storage to use for the ring. Supported values are + # consul, etcd, inmemory + store: +``` + +- Running without using a shared ring (inmemory): No action required +- Running with a shared ring and upgrading from v1.3.0 -> v1.4.0: No action required +- Running with a shared ring and upgrading from any version less than v1.3.0 (e.g. v1.2.0) -> v1.4.0: **ACTION REQUIRED** + +There are two options for upgrade if you are not on version 1.3.0 and are using a shared ring: + +- Upgrade first to v1.3.0 **BEFORE** upgrading to v1.4.0 + +OR + +**Note:** If you are running a single binary you only need to add this flag to your single binary command. + +1. Add the following configuration to your ingesters command: `-ingester.normalise-tokens=true` +1. Restart your ingesters with this config +1. Proceed with upgrading to v1.4.0 +1. Remove the config option (only do this after everything is running v1.4.0) + +**Note:** It's also possible to enable this flag via config file, see the [`lifecycler_config`](https://github.com/grafana/loki/tree/v1.3.0/docs/configuration#lifecycler_config) configuration option. + +If using the Helm Loki chart: + +```yaml +extraArgs: + ingester.normalise-tokens: true +``` + +If using the Helm Loki-Stack chart: + +```yaml +loki: + extraArgs: + ingester.normalise-tokens: true +``` + +#### What will go wrong + +If you attempt to add a v1.4.0 ingester to a ring created by Loki v1.2.0 or older which does not have the commandline argument `-ingester.normalise-tokens=true` (or configured via [config file](https://github.com/grafana/loki/tree/v1.3.0/docs/configuration#lifecycler_config)), the v1.4.0 ingester will remove all the entries in the ring for all the other ingesters as it cannot "see" them. + +This will result in distributors failing to write and a general ingestion failure for the system. + +If this happens to you, you will want to rollback your deployment immediately. You need to remove the v1.4.0 ingester from the ring ASAP, this should allow the existing ingesters to re-insert their tokens. You will also want to remove any v1.4.0 distributors as they will not understand the old ring either and will fail to send traffic.