A set of Grafana dashboards and Prometheus alerts for Kubernetes Autoscaling using the metrics from Kube-state-metrics, Karpenter and Cluster-autoscaler.
This serves as a extension for the Kubernetes-mixin and adds monitoring for components that are not deployed by default in a Kubernetes cluster (VPA, Karpenter, Cluster-Autoscaler).
The mixin provides the following dashboards:
- Kubernetes Autoscaling
- Pod Disruption Budgets
- Horizontal Pod Autoscalers
- Vertical Pod Autoscalers
- Cluster Autoscaler
- Karpenter
- Overview
- Activity
- Performance
There are also generated dashboards in the ./dashboards_out
directory.
There are alerts for the following components currently:
- Karpenter
VPA, Karpenter and Cluster Autoscaler are configurable in the config.libsonnet
file. They can be disabled by setting the enabled
field to false
.
This mixin is designed to be vendored into the repo with your infrastructure config. To do this, use jsonnet-bundler:
You then have three options for deploying your dashboards
- Generate the config files and deploy them yourself
- Use jsonnet to deploy this mixin along with Prometheus and Grafana
- Use prometheus-operator to deploy this mixin
Or import the dashboard using json in ./dashboards_out
, alternatively import them from the Grafana.com
dashboard page.
You can manually generate the alerts, dashboards and rules files, but first you must install some tools:
go get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb
brew install jsonnet
Then, grab the mixin and its dependencies:
git clone https://github.com/adinhodovic/kubernetes-autoscaling-mixin
cd kubernetes-autoscaling-mixin
jb install
Finally, build the mixin:
make prometheus_alerts.yaml
make dashboards_out
The prometheus_alerts.yaml
file then need to passed
to your Prometheus server, and the files in dashboards_out
need to be imported
into you Grafana server. The exact details will depending on how you deploy your
monitoring stack.
This mixin has its configuration in the config.libsonnet
file. You can disable the alerts for VPA, Karpenter and Cluster Autoscaler by setting the enabled
field to false
.
{
_config+:: {
vpa+:: {
enabled: false,
},
karpenter+:: {
enabled: false,
},
clusterAutoscaler+:: {
enabled: false,
},
},
}
The mixin has all components enabled by default and all the dashboards are generated in the dashboards_out
directory. You can import them into Grafana.
Kube-state-metrics does not ship with VPA metrics by default. You need to deploy a custom kube-state-metrics with the following configuration:
Adjust the ClusterRole
kube-state-metrics
to include the following rules:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: exporter
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
name: kube-state-metrics
rules:
# ... other rules
- apiGroups:
- autoscaling.k8s.io
resources:
- verticalpodautoscalers
verbs:
- list
- watch
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- list
- watch
Adjust the Deployment
kube-state-metrics
to include the following extra arguments:
kind: Deployment
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.13.0
name: kube-state-metrics
namespace: monitoring
spec:
...
containers:
- args:
...
- --custom-resource-state-config
- |
kind: CustomResourceStateMetrics
spec:
resources:
- groupVersionKind:
group: autoscaling.k8s.io
kind: "VerticalPodAutoscaler"
version: "v1"
labelsFromPath:
verticalpodautoscaler: [metadata, name]
namespace: [metadata, namespace]
target_api_version: [spec, targetRef, apiVersion]
target_kind: [spec, targetRef, kind]
target_name: [spec, targetRef, name]
metrics:
# Labels
- name: "verticalpodautoscaler_labels"
help: "VPA container recommendations. Kubernetes labels converted to Prometheus labels"
each:
type: Info
info:
labelsFromPath:
name: [metadata, name]
# Memory Information
- name: "verticalpodautoscaler_status_recommendation_containerrecommendations_target"
help: "VPA container recommendations for memory. Target resources the VerticalPodAutoscaler recommends for the container."
each:
type: Gauge
gauge:
path: [status, recommendation, containerRecommendations]
valueFrom: [target, memory]
labelsFromPath:
container: [containerName]
commonLabels:
resource: "memory"
unit: "byte"
- name: "verticalpodautoscaler_status_recommendation_containerrecommendations_lowerbound"
help: "VPA container recommendations for memory. Minimum resources the container can use before the VerticalPodAutoscaler updater evicts it"
each:
type: Gauge
gauge:
path: [status, recommendation, containerRecommendations]
valueFrom: [lowerBound, memory]
labelsFromPath:
container: [containerName]
commonLabels:
resource: "memory"
unit: "byte"
- name: "verticalpodautoscaler_status_recommendation_containerrecommendations_upperbound"
help: "VPA container recommendations for memory. Maximum resources the container can use before the VerticalPodAutoscaler updater evicts it"
each:
type: Gauge
gauge:
path: [status, recommendation, containerRecommendations]
valueFrom: [upperBound, memory]
labelsFromPath:
container: [containerName]
commonLabels:
resource: "memory"
unit: "byte"
- name: "verticalpodautoscaler_status_recommendation_containerrecommendations_uncappedtarget"
help: "VPA container recommendations for memory. Target resources the VerticalPodAutoscaler recommends for the container ignoring bounds"
each:
type: Gauge
gauge:
path: [status, recommendation, containerRecommendations]
valueFrom: [uncappedTarget, memory]
labelsFromPath:
container: [containerName]
commonLabels:
resource: "memory"
unit: "byte"
# CPU Information
- name: "verticalpodautoscaler_status_recommendation_containerrecommendations_target"
help: "VPA container recommendations for cpu. Target resources the VerticalPodAutoscaler recommends for the container."
each:
type: Gauge
gauge:
path: [status, recommendation, containerRecommendations]
valueFrom: [target, cpu]
labelsFromPath:
container: [containerName]
commonLabels:
resource: "cpu"
unit: "core"
- name: "verticalpodautoscaler_status_recommendation_containerrecommendations_lowerbound"
help: "VPA container recommendations for cpu. Minimum resources the container can use before the VerticalPodAutoscaler updater evicts it"
each:
type: Gauge
gauge:
path: [status, recommendation, containerRecommendations]
valueFrom: [lowerBound, cpu]
labelsFromPath:
container: [containerName]
commonLabels:
resource: "cpu"
unit: "core"
- name: "verticalpodautoscaler_status_recommendation_containerrecommendations_upperbound"
help: "VPA container recommendations for cpu. Maximum resources the container can use before the VerticalPodAutoscaler updater evicts it"
each:
type: Gauge
gauge:
path: [status, recommendation, containerRecommendations]
valueFrom: [upperBound, cpu]
labelsFromPath:
container: [containerName]
commonLabels:
resource: "cpu"
unit: "core"
- name: "verticalpodautoscaler_status_recommendation_containerrecommendations_uncappedtarget"
help: "VPA container recommendations for cpu. Target resources the VerticalPodAutoscaler recommends for the container ignoring bounds"
each:
type: Gauge
gauge:
path: [status, recommendation, containerRecommendations]
valueFrom: [uncappedTarget, cpu]
labelsFromPath:
container: [containerName]
commonLabels:
resource: "cpu"
unit: "core"
The mixin follows the monitoring-mixins guidelines for alerts.