-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #35 from timebertt/docs
Update, rework, and extend docs
- Loading branch information
Showing
32 changed files
with
1,738 additions
and
436 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
docs/assets/*.jpg filter=lfs diff=lfs merge=lfs -text |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,75 +1,79 @@ | ||
# Kubernetes Controller Sharding | ||
|
||
_Towards Horizontally Scalable Kubernetes Controllers_ | ||
_Horizontally Scalable Kubernetes Controllers_ 🚀 | ||
|
||
## About | ||
## TL;DR 📖 | ||
|
||
This study project is part of my master's studies in Computer Science at the [DHBW Center for Advanced Studies](https://www.cas.dhbw.de/) (CAS). | ||
Make Kubernetes controllers horizontally scalable by distributing reconciliation of API objects across multiple controller instances. | ||
Remove the limitation to have only a single active replica (leader) per controller. | ||
|
||
You can download and read the full thesis belonging to this implementation here: [thesis-controller-sharding](https://github.com/timebertt/thesis-controller-sharding). | ||
This repository contains the practical part of the thesis: a sample operator using the sharding implementation, a full monitoring setup, and some tools for demonstration and evaluation purposes. | ||
See [Getting Started With Controller Sharding](docs/getting-started.md) for a quick start with this project. | ||
|
||
The controller sharding implementation itself is done generically in [controller-runtime](https://github.com/kubernetes-sigs/controller-runtime). | ||
It is currently located in the `sharding` branch of my fork: https://github.com/timebertt/controller-runtime/tree/sharding. | ||
## About ℹ️ | ||
|
||
## TL;DR | ||
I started this project as part of my Master's studies in Computer Science at the [DHBW Center for Advanced Studies](https://www.cas.dhbw.de/) (CAS). | ||
I completed a study project ("half-time thesis") about this topic. I'm currently working on my Master's thesis to evolve the project based on the first iteration. | ||
|
||
Distribute reconciliation of Kubernetes objects across multiple controller instances. | ||
Remove the limitation to have only one active replica (leader) per controller. | ||
- Download and read the study project (first paper) here: [thesis-controller-sharding](https://github.com/timebertt/thesis-controller-sharding) | ||
- Download and read the Master's thesis (second paper) here (WIP): [masters-thesis-controller-sharding](https://github.com/timebertt/masters-thesis-controller-sharding) | ||
|
||
## Motivation | ||
This repository contains the implementation belonging to the scientific work: the actual sharding implementation, a sample operator using controller sharding, a monitoring and continuous profiling setup, and some tools for development and evaluation purposes. | ||
|
||
## Motivation 💡 | ||
|
||
Typically, [Kubernetes controllers](https://kubernetes.io/docs/concepts/architecture/controller/) use a leader election mechanism to determine a *single* active controller instance (leader). | ||
When deploying multiple instances of the same controller, there will only be one active instance at any given time, other instances will be on standby. | ||
This is done to prevent controllers from performing uncoordinated and conflicting actions (reconciliations). | ||
This is done to prevent multiple controller instances from performing uncoordinated and conflicting actions (reconciliations) on a single object concurrently. | ||
|
||
If the current leader goes down and loses leadership (e.g. network failure, rolling update) another instance takes over leadership and becomes the active instance. | ||
Such a setup can be described as an "active-passive HA setup". It minimizes "controller downtime" and facilitates fast failovers. | ||
Such a setup can be described as an "active-passive HA setup". It minimizes "controller downtime" and facilitates fast fail-overs. | ||
However, it cannot be considered as "horizontal scaling" as work is not distributed among multiple instances. | ||
|
||
This restriction imposes scalability limitations for Kubernetes controllers. | ||
I.e., the rate of reconciliations, amount of objects, etc. is limited by the machine size that the active controller runs on and the network bandwidth it can use. | ||
In contrast to usual stateless applications, one cannot increase the throughput of the system by adding more instances (scaling horizontally) but only by using bigger instances (scaling vertically). | ||
|
||
This study project presents a design that allows distributing reconciliation of Kubernetes objects across multiple controller instances. | ||
It applies proven sharding mechanisms used in distributed databases to Kubernetes controllers to overcome the restriction of having only one active replica per controller. | ||
The sharding design is implemented in a generic way in my [fork](https://github.com/timebertt/controller-runtime/tree/sharding) of the Kubernetes [controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) project. | ||
The [webhosting-operator](#webhosting-operator) is implemented as an example operator that uses the sharding implementation for demonstration and evaluation. | ||
These are the first steps toward horizontally scalable Kubernetes controllers. | ||
|
||
## Sharding Design | ||
|
||
![Sharding Architecture](assets/architecture.svg) | ||
|
||
High-level summary of the sharding design: | ||
|
||
- multiple controller instances are deployed | ||
- one controller instance is elected to be the sharder via the usual leader election | ||
- all instances maintain individual shard leases for announcing themselves to the sharder (membership and failure detection) | ||
- the sharder watches all objects (metadata-only) and the shard leases | ||
- the sharder assigns individual objects to shards (using consistent hashing) by labeling them with the `shard` label | ||
- the shards use a label selector to restrict the cache and controller to the set of objects assigned to them | ||
- for moving objects (e.g. during rebalancing on scale-out), the sharder drains the object from the old shard by adding the `drain` label; after the shard has acknowledged the drain operation by removing both labels, the sharder assigns the object to the new shard | ||
- when a shard releases its shard lease (voluntary disruption) the sharder assigns the objects to another active instance | ||
- when a shard loses its shard lease the sharder acquires the shard lease (for ensuring the API server's reachability/functionality) and forcefully reassigns the objects | ||
|
||
|
||
Read chapter 4 of the full [thesis](https://github.com/timebertt/thesis-controller-sharding) for a detailed explanation of the sharding design. | ||
|
||
## Contents of This Repository | ||
|
||
- [docs](docs): | ||
- [getting started with controller sharding](docs/getting-started.md) | ||
- [sharder](cmd/sharder): the main sharding component orchestrating sharding | ||
- [shard](hack/cmd/shard): a simple dummy controller that implements the requirements for sharded controllers (used for development and testing purposes) | ||
- [webhosting-operator](webhosting-operator): a sample operator for demonstrating and evaluating the implemented sharding design for Kubernetes controllers | ||
- [samples-generator](webhosting-operator/cmd/samples-generator): a tool for generating a given number of random `Website` objects | ||
- [monitoring setup](hack/config/monitoring): a setup for monitoring and measuring load test experiments for the sample operator | ||
- includes [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) | ||
- [sharding-exporter](config/monitoring/sharding-exporter): (based on the [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) [custom resource metrics feature](https://github.com/kubernetes/kube-state-metrics/blob/main/docs/customresourcestate-metrics.md)) for metrics on the state of shards | ||
- [webhosting-exporter](webhosting-operator/config/monitoring/webhosting-exporter) for metrics on the state of the webhosting-operator's API objects (similar to above) | ||
- [grafana](https://github.com/grafana/grafana) along with some dashboards for [controller-runtime](hack/config/monitoring/default/dashboards) and [webhosting-operator and sharding](webhosting-operator/config/monitoring/default/dashboards) | ||
- [experiment](webhosting-operator/cmd/experiment): a tool (based on controller-runtime) for executing load test scenarios for the webhosting-operator | ||
- [measure](webhosting-operator/cmd/measure): a tool for retrieving configurable measurements from prometheus and storing them in csv-formatted files for further analysis (with `numpy`) and visualization (with `matplotlib`) | ||
- a few [kyverno](https://github.com/kyverno/kyverno) policies for [scheduling](webhosting-operator/config/policy) and the [control plane](hack/config/policy) for more stable load test results | ||
- a simple [parca](https://github.com/parca-dev/parca) setup for [profiling](hack/config/policy) the sharding components and webhosting-operator during load tests | ||
## Introduction 🚀 | ||
|
||
This project allows scaling Kubernetes controllers horizontally by removing the restriction of having only one active replica per controller (allows active-active setups). | ||
It distributes reconciliation of Kubernetes objects across multiple controller instances, while still ensuring that only a single controller instance acts on a single object at any given time. | ||
For this, the project applies proven sharding mechanisms used in distributed databases to Kubernetes controllers. | ||
|
||
The project introduces a `sharder` component that implements sharding in a generic way and can be applied to any Kubernetes controller (independent of the used programming language and controller framework). | ||
The `sharder` component is installed into the cluster along with a `ClusterRing` custom resource. | ||
A `ClusterRing` declares a virtual ring of sharded controller instances and specifies API resources that should be distributed across shards in the ring. | ||
It configures sharding on the cluster-scope level (i.e., objects in all namespaces), hence the `ClusterRing` name. | ||
|
||
The watch cache is an expensive part of a controller regarding network transfer, CPU (decoding), and memory (local copy of all objects). | ||
When running multiple instances of a controller, the individual instances must thus only watch the subset of objects they are responsible for. | ||
Otherwise, the setup would only multiply the resource consumption. | ||
The sharder assigns objects to instances via the shard label. | ||
Each shard then uses a label selector with its own instance name to watch only the objects that are assigned to it. | ||
|
||
Alongside the actual sharding implementation, this project contains a setup for simple [development, testing](docs/development.md), and [evaluation](docs/evaluation.md) of the sharding mechanism. | ||
This includes an example operator that uses controller sharding ([webhosting-operator](webhosting-operator)). | ||
See [Getting Started With Controller Sharding](docs/getting-started.md) for more details. | ||
|
||
To support sharding in your Kubernetes controller, only three aspects need to be implemented: | ||
|
||
- announce ring membership and shard health: maintain individual shard `Leases` instead of performing leader election on a single `Lease` | ||
- only watch, cache, and reconcile objects assigned to the respective shard: add a shard-specific label selector to watches | ||
- acknowledge object movements during rebalancing: remove the drain and shard label when the drain label is set and stop reconciling the object | ||
|
||
See [Implement Sharding in Your Controller](docs/implement-sharding.md) for more information and examples. | ||
|
||
## Design 📐 | ||
|
||
![Sharding Architecture](docs/assets/architecture.svg) | ||
|
||
See [Design](docs/design.md) for more details on the sharding architecture and design decisions. | ||
|
||
## Discussion 💬 | ||
|
||
Feel free to contact me on the [Kubernetes Slack](https://kubernetes.slack.com/) ([get an invitation](https://slack.k8s.io/)): [@timebertt](https://kubernetes.slack.com/team/UF8C35Z0D). | ||
|
||
## TODO 🧑💻 | ||
|
||
- [ ] implement more tests: unit, integration, e2e tests | ||
- [ ] add `ClusterRing` API validation | ||
- [ ] implement a custom generic sharding-exporter |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.