Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

providing new standard for node distribution in draft version - to be… #806

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions Standards/scs-0214-v3-k8s-node-distribution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
---
title: Kubernetes Node Distribution and Availability
type: Standard
status: Draft
replaces: scs-0214-v1-k8s-node-distribution.md and scs-0214-v1-k8s-node-distribution.md
track: KaaS
---

## Introduction

A Kubernetes instance is provided as a cluster, which consists of a set of machines,
so-called nodes. A cluster is composed of a control plane and at least one worker node.
The control plane manages the worker nodes and therefore the pods in the cluster by making
decisions about scheduling, event detection and rights management. Inside the control plane,
multiple components exist, which can be duplicated and distributed over multiple nodes
inside the cluster. Typically, no user workloads are run on these nodes in order to
separate the controller component from user workloads, which could pose a security risk.

The Kubernetes project maintains multiple release versions, with the three most recent minor
versions actively supported, along with a fourth version in development.
Each new minor version replaces the oldest version at the end of its support period,
which typically spans approximately 14 months, comprising a 12-month standard support period
followed by a 2-month end-of-life (EOL) phase for critical updates.

### Glossary

The following terms are used throughout this document:

| Term | Meaning |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Worker | Virtual or bare-metal machine, which hosts workloads of customers |
| Control Plane | Virtual or bare-metal machine, which hosts the container orchestration layer that exposes the API and interfaces to define, deploy, and manage the lifecycle of containers. |
| Machine | Virtual or bare-metal entity with computational capabilities |
| Failure Zone | A logical entity representing a group of physical machines that share a risk of failure due to their proximity or dependency on common resources. |

## Motivation

In normal day-to-day operation, it is not unusual for some operational failures, either
due to wear and tear of hardware, software misconfigurations, external problems or
user errors. Whichever was the source of such an outage, it always means down-time for
operations and users and possible even data loss.
Therefore, a Kubernetes cluster in a productive environment should be distributed over
multiple "failure zones" in order to provide fault-tolerance and high availability.
This is especially important for the control plane of the cluster, since it contains the
state of the whole cluster. A failure of this component could mean an unrecoverable failure
of the whole cluster.

## Design Considerations

Most design considerations of this standard follow the previously written Decision Record
[Kubernetes Nodes Anti Affinity][scs-0213-v1] as well as the Kubernetes documents on
[High Availability][k8s-ha] and [Best practices for large clusters][k8s-large-clusters].

The SCS prefers distributed, highly available systems due to advantages such as fault tolerance and
data redundancy. It also acknowledges the costs and overhead for providers associated with this effort,
given that hardware and infrastructure may be dedicated to fail-over safety and duplication.

The [Best practices for large clusters][k8s-large-clusters] documentation describes the concept
of a failure zone. This term is context-dependent and describes a group of physical machines that are close
enough—physically or logically—that a specific issue could affect all machines in the zone.
To mitigate this, critical data and services should not be confined to one failure zone.
How a failure zone is defined depends on the risk model and infrastructure capabilities of the provider,
ranging from single machines or racks to entire datacenters or regions. Failure zones are therefore logical
entities that should not be strictly defined in this document.


## Decision

This standard formulates the requirements for the distribution of Kubernetes nodes to provide a fault-tolerant
and available Kubernetes cluster infrastructure. Since some providers only have small environments to work
with and therefore couldn't comply with this standard, it will be treated as a RECOMMENDED standard,
where providers can OPT OUT.

### Control Plane Requirements

1. **Distribution Across Physical Machines**: Control plane nodes MUST be distributed over multiple physical
machines to avoid single points of failure, aligning with Kubernetes best practices.
2. **Failure Zone Placement**: At least one control plane instance MUST be run in each defined failure zone.
More instances in each failure zone are RECOMMENDED to enhance fault tolerance within each zone.

### Worker Node Requirements

- The control plane nodes MUST be distributed over multiple physical machines. Kubernetes
provides best-practices on this topic, which are also RECOMMENDED by SCS.
- At least one control plane instance MUST be run in each "failure zone", more are
RECOMMENDED in each "failure zone" to provide fault-tolerance for each zone.
- Worker nodes are RECOMMENDED to be distributed over multiple zones. This policy makes
it OPTIONAL to provide a worker node in each "failure zone", meaning that worker nodes
can also be scaled vertically first before scaling horizontally.
- Worker node distribution MUST be indicated to the user through some kind of labeling
in order to enable (anti)-affinity for workloads over "failure zones".
- To provide metadata about the node distribution, which also enables testing of this standard,
providers MUST label their K8s nodes with the labels listed below.


To provide metadata about node distribution and enable efficient workload scheduling and testing of this standard,
providers MUST label their Kubernetes nodes with the following labels. These labels MUST remain current with the
deployment’s state.

- `topology.kubernetes.io/zone`
- Corresponds with the label described in [K8s labels documentation][k8s-labels-docs].
This label provides a logical failure zone identifier on the provider side,
such as a server rack in the same electrical circuit. It is typically autopopulated by either
the kubelet or external mechanisms like the cloud controller.

- `topology.kubernetes.io/region`
- This label groups multiple failure zones into a region, such as a building with multiple racks.
It is typically autopopulated by the kubelet or a cloud controller.

- `topology.scs.community/host-id`
- This SCS-specific label MUST contain the unique hostID of the physical machine running the hypervisor,
helping identify the physical machine’s distribution.

## Conformance Tests

The `k8s-node-distribution-check.py` script assesses node distribution using a user-provided kubeconfig file.
It verifies compliance based on the `topology.scs.community/host-id`, `topology.kubernetes.io/zone`,
`topology.kubernetes.io/region`, and `node-role.kubernetes.io/control-plane` labels.
The script produces errors if node distribution does not meet the standard’s requirements and generates
warnings if labels appear incomplete.

## Previous Standard Versions

This version extends [version 1](scs-0214-v1-k8s-node-distribution.md) by enhancing node labeling requirements.

[k8s-ha]: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/
[k8s-large-clusters]: https://kubernetes.io/docs/setup/best-practices/cluster-large/
[scs-0213-v1]: https://github.com/SovereignCloudStack/standards/blob/main/Standards/scs-0213-v1-k8s-nodes-anti-affinity.md
[k8s-labels-docs]: https://kubernetes.io/docs/reference/labels-annotations-taints/#topologykubernetesiozone

55 changes: 55 additions & 0 deletions Standards/scs-0219-v1-high availability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: Kubernetes High Availability (HA)
type: Standard
status: Draft
track: KaaS
---

## Introduction

High availability (HA) is a critical design principle in Kubernetes clusters to ensure operational continuity and
minimize downtime during failures. The control plane is the central component of a Kubernetes cluster, managing the state
and operations of the entire system. Ensuring HA involves distributing the control plane across multiple physical or
logical failure zones to reduce risks from localized failures.

## Glossary

| Term | Meaning |
|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Control Plane | Virtual or bare-metal machine, which hosts the container orchestration layer that exposes the API and interfaces to define, deploy, and manage the lifecycle of containers. |
| Failure Zone | A logical grouping of machines with shared dependencies, such as network infrastructure, power supply, or physical proximity, that may fail as a unit. |

## Motivation

High availability (HA) is essential for ensuring the reliable operation of Kubernetes clusters, especially in production
environments where downtime can lead to significant operational breakdown.

Failures in single hosts are far more common than the outage of an entire room, zone, or availability zone (AZ).
Hosts can fail due to a variety of reasons, including:
* Hardware Failures: Broken RAM, PSU (power supply unit), or network ports.
* Operational Issues: Regular maintenance activities, such as hypervisor or firmware upgrades.

Distributing the control plane across multiple failure zones provides fault tolerance by ensuring that the cluster can
continue functioning even if one or more zones become unavailable. This setup enhances resilience by allowing the system
to recover from failures with minimal disruption. For example, in the event of a hardware failure, a network outage,
or a power disruption in one zone, the other zones can seamlessly take over control plane responsibilities.

Moreover, HA setups improve data consistency and cluster stability by ensuring quorum for distributed systems like etcd.
This prevents scenarios where the cluster state becomes inaccessible or inconsistent due to partial outages. By adhering
to HA principles, organizations can achieve greater uptime, maintain service-level agreements (SLAs), and ensure a
seamless user experience even in the face of unexpected disruptions.

## Decision

Control plane nodes MUST be distributed across at least three distinct physical hosts. This setup ensures resilience to
individual host-level failures caused by issues such as broken RAM, power supply unit (PSU) failures, network interface
card (NIC) malfunctions, or planned maintenance operations like firmware updates or hypervisor upgrades.
These events occur significantly more often in data centers than the complete failure of a room, rack, or
availability zone (AZ). As such, prioritizing distribution across physical hosts provides a practical and robust
baseline for HA, even in environments where multi-AZ configurations are not feasible.

## Documents

* [Creating Highly Available Clusters with kubeadm](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/)
* [Community Discussions and Notes](https://github.com/SovereignCloudStack/standards/issues/639)
* [Options for Highly Available Topology](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/ha-topology/)
Loading