As more and more teams running their workloads on Kubernetes started deploying stateful applications(kafka, zookeeper, rabbitmq, redis, etc) on top of a Kubernetes platform, there might be challenges on finding solutions for keeping alive the minion nodes(k8s worker nodes) where those pods part of a StatefulSet/Deployment/Argo Rollout are running. There might be cases where worker nodes need to be running for an extended period of time during a full cluster upgrade in order to ensure no downtime at application level while rotating the worker nodes.
K8s-shredder introduces the concept of parked nodes which aims to address some critical aspects on a Kubernetes cluster while rotating the worker nodes during a cluster upgrade:
- allow teams running stateful apps to move their workloads off of parked nodes at their will, independent of clusters upgrade lifecycle.
- optimises cloud costs by dynamically purging unschedulable worker nodes(parked nodes).
- notifies clients that they are running workloads on parked nodes so that they can take proper actions.
In order to enable k8s-shredder on a Kubernetes cluster you can use the manifests as described in k8s-shredder spec.
Then, during a cluster upgrade, while rotating the worker nodes, you have to label the nodes that you want them parked with:
shredder.ethos.adobe.net/upgrade-status=parked
shredder.ethos.adobe.net/parked-node-expires-on=<Node_expiration_timestamp>
Additionally, if you want a pod to be exempted from the eviction loop until parked node TTL expires, you can label the pod with "shredder.ethos.adobe.net/allow-eviction=false" so that k8s-shredder will know to skip it.
The following options can be used to customise the k8s-shredder controller:
Name | Default Value | Description |
---|---|---|
EvictionLoopInterval | 60s | How often to run the eviction loop process |
ParkedNodeTTL | 60m | Time a node can be parked before starting force eviction process |
RollingRestartThreshold | 0.5 | How much time(percentage) should pass from ParkedNodeTTL before starting the rollout restart process |
UpgradeStatusLabel | "shredder.ethos.adobe.net/upgrade-status" | Label used for the identifying parked nodes |
ExpiresOnLabel | "shredder.ethos.adobe.net/parked-node-expires-on" | Label used for identifying the TTL for parked nodes |
NamespacePrefixSkipInitialEviction | "" | For pods in namespaces having this prefix proceed directly with a rollout restart without waiting for the RollingRestartThreshold |
RestartedAtAnnotation | "shredder.ethos.adobe.net/restartedAt" | Annotation name used to mark a controller object for rollout restart |
AllowEvictionLabel | "shredder.ethos.adobe.net/allow-eviction" | Label used for skipping evicting pods that have explicitly set this label on false |
ToBeDeletedTaint | "ToBeDeletedByClusterAutoscaler" | Node taint used for skipping a subset of parked nodes that are already handled by cluster-autoscaler |
ArgoRolloutsAPIVersion | "v1alpha1" | API version from argoproj.io API group to be used while handling Argo Rollouts objects |
K8s-shredder will periodically run eviction loops, based on configured EvictionLoopInterval
, trying to clean up all the pods from
the parked nodes. Once all the pods are cleaned up, cluster-autoscaler should chime in and recycle the parked node.
The diagram below describes a simple flow about how k8s-shredder handles stateful set applications: