SLIs and SLOs with Prometheus and Grafana for your APIs managed by Tyk

Ported over from Sonja's most excellent OSS repo

About

This demonstrates how to configure Tyk Gateway, Tyk Pump, Prometheus and Grafana OSS to set-up a dashboard with SLIs and SLOs for your APIs managed by Tyk.

You can use it to explore the Prometheus metrics exposed by Tyk Pump and use them in a Grafana dashboard.

Setup

Run the up.sh script with the slo-prometheus-grafana parameter:

./up.sh slo-prometheus-grafana

Generate traffic

K6 is used to generate traffic to the API endpoints. The load script load.js will run for 15 minutes.

./docker-compose-command.sh run k6 run /scripts/load.js

You will see K6 output in your terminal:

Check out the dashboard in Grafana

Go to Grafana in your browser (initial user/pwd: admin/admin) and open the dashboard called SLOs for APIs managed by Tyk.

You should see the data coming in:

You can also filter the data per API:

How this works

Configuration

Tyk API Gateway is configured to expose two API endpoint:
- httpbin (see .json config)
- httpstatus (see .json config)
K6 will use the load script load.js to generate demo traffic to the API endpoints
Tyk Pump is configured to expose a metric endpoint for Prometheus (see config) with two custom metrics called tyk_http_requests_total and tyk_http_latency. Tyk Pump version >= 1.6. is needed for custom metrics.
Prometheus
- prometheus.yml is configured to automatically scrape Tyk Pump's metric endpoint
- slos.rules.yml is used to calculate additional metrics needed for the remaining error budget
Grafana
- prometheus_ds.yml is configured to connect Grafana automatically to Prometheus
- SLOs-for-APIs-managed-by-Tyk.json is the dashboard definition

SLIs and SLOs

Definition and example inspired from https://sre.google/workbook/slo-document/, https://landing.google.com/sre/workbook/chapters/alerting-on-slos/ and https://github.com/google/prometheus-slo-burn-example/blob/master/prometheus/slos.rules.yml.

You will see different indicators displayed on the Grafana dashboard.

To calculate the SLO and the displayed error budget remaining, we use the following SLI/SLO:

SLI: the proportion of successful HTTP requests, as measured from Tyk API Gateway
- Any HTTP status other than 500–599 is considered successful.
- count of http_requests which do not have a 5XX status code divided by count of all http_requests
SLO: 95% successful requests

In slos.rules.yml we calculate the rate of error per requests for the last 10 minute in job:slo_errors_per_request:ratio_rate10m. With job:error_budget:remaining we calculate the error budget remaining in percent. This is what we display in the Grafana dashboard. We use a threshold of 95% in the dashboard (every value below 95% is red).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SLIs and SLOs with Prometheus and Grafana for your APIs managed by Tyk

About

Setup

How this works

Configuration

SLIs and SLOs

Files

README.md

Latest commit

History

README.md

File metadata and controls

SLIs and SLOs with Prometheus and Grafana for your APIs managed by Tyk

About

Setup

How this works

Configuration

SLIs and SLOs