Skip to content

Kubernetes 101

wyvern8 edited this page Jan 12, 2018 · 8 revisions

Kubernetes 101

TL;DR https://cloud.google.com/kubernetes-engine/kubernetes-comic/

This is an opinionated view of the past, present and future of managed compute resources, and how Kubernetes is uniquely positioned to play a significant role in the realisation of the goals of the agile manifestos, in particular challenges in the 'devops' space.

In addition, serverless and event driven microservices architectures using AWS Lambda, SQS, Kinesis and similar services built by other providers represent a view of where true cloud native architectures are headed, and some approaches addressed by container management now may be superceded as both organisations and streaming frameworks mature.

Either way, it is highly likely that enterprises will need to support monolithic applications into the foreseeable future, so thinking about ways to decompose these into manageable units of functionality incrementally - with a strategic architecture in mind - is going to be an ongoing focus for forward thinking organisations.

Background

In the beginning, servers were glorified PCs used to share data, then:

  • rooms of networked servers were used as data centres (and for friday night doom sessions)
  • these rooms were outsourced to hosting providers
  • hosting providers realised they could cut costs by sharing resources allocated to customers, as they were not always fully utilised.
  • new generations of hosting providers leveraged virtualisation to provide unprecedented virtual network capabilities and availability levels and the 'cloud' was born.
  • organisations started moving local/offshore DCs into the cloud.

What is Kubernetes (k8s)?

Kubernetes started life as 'Borg', the Google internal container management system used to serve google services for 10 years before it was enhanced and released as an open source project to the development community.

Virtually everything google does runs on this platform or variants thereof.

Before answering the 'what is k8s?' question in more detail, let's further summarise some of the history and concepts that led us here:

  • what are virtual machines?
  • what is docker? (and other container types)
  • what are the problems that each try to solve?

What are virtual machines?

Virtual machines are primarily a hardware abstraction layer and scheduling engine.

  • VMs allow software designed to run on a specific architecture or operating system to run on another operating system/platform,

    • This is really useful for sharing servers and apps to others without having to provide detailed instructions to get things going, and alleviates the 'it works on my machine' frustration.
  • VMs also allow a single physical host to appear as many logical hosts.

    • This is the key approach used by 'cloud' providers to sell units of compute time etc

These concepts have existed in *nix for many years, but it was when desktop and hosted server virtualization hit the mainstream that things really took off and paved the way to the cloud services we take for granted today.

There are two types of hypervisors for abstraction layers and resource scheduling:

  • Type 1: runs directly on hardware wthout a host OS. Also referred to as native or 'bare metal'.
    • eg. AWS custom hardware with minimal kernel and Xen/KVM running EC2 instances
  • Type 2: runs on a host os.
    • eg. Windows 10 running Virtualbox launching an Ubuntu VM.

What about the 'cloud'?

Virtualization is the foundation of the 'cloud' provider compute resources we see today such as AWS EC2, Azure, GCE, Digital Ocean. These services generally now use Type 1 hypervisors such as Xen, KVM, or custom hypervisors, with the aim to minimise the impact of virtualization to the guest processes as much as possible.

In addition to compute virtualisation, these services compete via ancillary services such as network/security infrastructure, storage and specialized services such as vms optimized for machine learning.

So what is the the problem with VMs?

The main issue with VMs aside from performance overheads, is that each approach to virtualise a machine is different, often proprietary, and opaque in the case of cloud providers which each have their own vendor specific apis to manage platform resources. VMs can also be huge to transfer once built/baked.

There are open standards for local virualized hardware that will continue to play a role. Hashicorp in particular created Vagrant for vm abstraction, and Terraform as a DSL to describe cloud services resources, however there are many challenges that are vendor specific and change over time.

So how do we fix this and have a standard way of dealing with compute resources?

Yet another abstraction layer! ...but one tightly coupled to the linux kernel using proven mechanisms.

In order to deal with different cloud provider resource management approaches, this homogeneous abstraction layer will still use specific types of virtual machines provided by AWS or other providers as compute 'nodes'

Once these nodes are attached as a cluster node, we no longer care where it is hosted. It is a similar story for other resource types - each provider has a solution, but they are slightly different, so abstracted so that k8s does not need to deal with platform specific features.

K8s is built from the ground up to have pluggable extensibility.

Start with the smallest unit

It is accepted that containers are the build output unit that should encapsulate the code asset to be deployed to a homogeneous network.

It is also accepted that the container itself should not not describe how it should be deployed.

However.. kubernetes does not treat containers as the smallest deployable unit. Instead it uses the concept of a 'pod'. A pod can contain multiple containers if required following best practice use cases, and allow groups of containers to be scaled as a whole.

A separate detailed discussion on types of deployments will be provided separately.

Declarative deployments

Rather than spelling out how a service should be deployed and exposed, k8s encourages the approach of specifying the desired outcome, and letting the k8s and helm tiller process work out how things should happen.

The deployment descriptor referencing the container should control how many instances and their geographic distribution, along with any dependency types and their scaling patterns.

This can be simply achieved via k8s manifests deployed via kubectl, however typically the 'Helm Chart' approach is used for in order to allow for versioned releases and rollbacks.

Cloud provider support

Increasingly we are seeing efforts by cloud providers to focus on container based and event driven serverless strategies.

  • they are faster, cheaper, and simpler to operate
  • customers do not want to lock in to vendor specific approaches, when a community led initiative such as k8s is becoming a defacto standard.

To date, all cloud providers as well as Docker have recognised that Kubernetes is the best way to deal with cross context container scheduling complexities, and have released managed k8s offerings. (aws this year)

What is docker?

Docker is currently the defacto standard for 'containers' although k8s is not locked to this and also supports the competing rkt containers.

Docker is based on the concept of LXC 'cgroups' (control groups), which is a linux kernel feature facilitating allocation of resources (cpu/memory/network etc) on a linux system, to sets of privileged tasks in a hierarchical access controlled structure.

This kernel feature was arguably underutilized until Docker TM built tools using this foundation to achieve the benefits of virtual machines, without the overhead of emulating the hardware in addition to the desired application environment.

Docker consists of services and tools to help manage cgroup segmented resources as 'containers' in a way that hierarchies of images can share.

The key to this is that all linux distributions use the same kernel (within versions). For example, a docker image can be created from another docker image as a base, such as the signed Ubuntu image.

Subsequent docker images can also be created 'from' this Ubuntu image, but as you already have the ubuntu base image and you already have a local kernel - it does not need to be downloaded for each image based on ubuntu.

This is a huge time/cost saving, and can subsequently be published for others to use and build upon.

Kernel (host) -> Ubuntu image (local image) -> Ubuntu NGINX (pulled from registry)

Because docker instances use the local kernel and dependencies to run natively, this is a game changer for performance. Additional instances can be started in seconds - as long as the host machine has the resources to support this.

Why should i care about k8s?

No-one likes staying back for releases. They should not be events, they should just happen automatically when certain criteria are met. Continuous deployment is where we are headed as an an industry, and k8s is a proven key enabler on this journey.

From a commercial perspective, under-utilised cloud vms can be stacked efficiently with workloads rather than having dedicated vms per services set and guestimating resource usage. Autoscaling can occur both at the node host provider level, and at the pod level. (pods covered in separate session).

In addition, docker uses a git-like hierarchy of image hashes, and images can either be pulled from public image repositories or private / hosted repos and validated by hash. This means confidence in deployed asset integrity.

A key conceptual difference between virtual machines and containers is that containers are just focused on deploying code and services in a reproducable manner rather than the platform required to make it happen.

K8s is the platform that accepts docker containers as a great way to package code deployments, and provides a platform independant way to achive HA deployments and other advanced features without vendor lock in.

Clone this wiki locally