Kubernetes: The Universal Control Plane

The innovation is in the API

This is the first in a two part series on Kubernetes. Part one is an answer: What are the key ideas that influence the design of Kubernetes? What about Kubernetes sets it apart from other platforms? Part two is about the second order effects: How does the industry respond to a universal control plane? What will our tooling, practices, and platforms look like in the future?

Ask a developer what Kubernetes is about, and the response would probably be something related to container orchestration. It is a fair answer, considering the description on the homepage:

Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications.

During the fierce container wars of the past few years, Kubernetes positioned itself as the dominant container platform. But in the long term I think the vision of Kubernetes centers not around containers, but its API, and it attempts to be a platform for managing software at a more fundamental and encompassing level.

To put it briefly, I think the goal is for the Kubernetes API to be the universal control plane of software. The API aims to be the authoritative interface for managing software. If we can model the domain as a resource, Kubernetes should manage it. Successful execution of the project’s strategy will shift cloud computing, software infrastructure, and Kubernetes itself.

Looking Back: Principles of Cloud Computing

Before we get into control planes, we need to take a step back and analyze the rise and rapid adoption of cloud computing. An obvious reason to do so is because Kubernetes was influenced by cloud computing, and Kubernetes applies these principles in its own architecture. Here are two insights from building cloud infrastructure:

  1. Infrastructure Is Programmable: Compared to earlier infrastructure, cloud providers needed to build compute/storage resources from the ground up that are entirely driven through API calls. With APIs available to spin up resources like compute and storage, both the provider and users can build higher level resources on top (eg. serverless compute). They can also create fundamental primitives like schedulers and control planes to manage resources through code.

  2. Infrastructure Is Declarative: To deliver infrastructure at scale, cloud providers needed to shield users from failure occurring in internal workflows. Exposing declarative APIs means that the focus is on the outcome, and lets the provider submerge the complexity of infrastructure provisioning under the hood.

Kubernetes integrates these ideas into an open architecture to set the stage for a universal control plane.

Infrastructure Is Programmable

The main factors that made cloud computing attractive on the technology side is that these services are designed from the ground up to be elastic, self-service, and API driven. In this model, users do not have to file tickets or manually intervene to create virtual machines and databases. Instead the complexity of spinning up virtual compute in a datacenter is encapsulated as a service and the end user only needs to make a function call.

Like all code, once core services are exposed as APIs, they can be leveraged to create even higher level abstractions by reusing functionality. Amazon Web Services (AWS) offerings like container Fargate and serverless Lambda are higher level compute services but ultimately run on EC2 virtual machines. Initial investment into API first compute services pays dividends later to enable specialized offerings that are differentiators to other vendors. Users of cloud platforms can also leverage the same APIs to build their own abstractions as well. For example, Netflix built Titus to schedule and run container workloads with EC2 instances as the underlying hosts.

The takeaway is that the innovation of providing infrastructure primitives as APIs allowed cloud providers to build even higher level services. This layering of abstractions quickly set cloud providers apart from legacy commodity hosting providers who could not offer equivalent flexibility without large investment. Kubernetes would integrate these lessons into its own design philosophy.

Infrastructure Is Declarative

Infrastructure management was historically an imperative affair. To run a task like installing software, operators encode every step of the process in a runbook or automation. Since every step is explicit, when failure occurs the workflow must account for the issue and to be able to recover or atleast fail partially. Unsurprisingly, most automation was not built to recover from the wide range of issues that could happen in a machine, ranging from software dependency incompatibility to hardware issues such as failed disks. In the end, human operators would manually intervene to recover the system to a good state so that the process could be run again.

Cloud vendors faced the same challenges in spinning up software but could not rely on the same techniques of the past. At scale, manual recovery would be impractical and uneconomic for every encountered failure in a workflow. The industry needed a way for users to define software infrastructure without exposing them to the multitudes of failures that can happen inside internal systems. The eventual solution was the adoption of declarative APIs. With declarative APIs, the user tells the cloud what outcome they want to see, instead of how to achieve it. For users, they can work with cloud providers in a way that is closer to what they actually want to do: describe what deployments should look like instead of how to deploy it exactly.

For cloud providers, they now had the flexibility to push the complexity of managing software and the underlying hardware behind the API. The vendor takes responsibility for the heavy lifting of creating infrastructure. In return, they had the freedom to achieve the goal how they wished: the vendor is not constrained to always support specific steps in a workflow like a cloud built on an imperative API. For example, a cloud vendor can orchestrate many services in the background to provide MySQL as a service. In the future, they can change the implementation of any part of the workflow and the declarative API would remain the same.

Kubernetes faces the same challenges as cloud providers do as an orchestrator for containers and infrastructure. It is unsurprising then that Kubernetes also heavily bases its design around declarative APIs as the core user interface.

First Class Abstractions

Declarative APIs are the defining factor of why cloud providers are attractive on the usability side. Architecting infrastructure this way reduces friction: it is simply magnitudes easier for both the provider to build and the user to consume. With the reduced friction, new abstractions in the form of higher level services and primitives like compute schedulers are now commonplace where they used to be limited to sophisticated software organizations.

The other side of the coin is that cloud computing offerings, the core primitives we use to build our own services, are locked behind proprietary APIs. Criticizing cloud vendors as proprietary seems antithetical to the concepts described earlier. Even if the APIs to interact with the cloud are proprietary, that certainly does not stop users from creating their own abstractions with cloud compute/storage as the foundation. And one of the benefits of declarative systems is that much of the complexity of infrastructure provisioning is pushed under the surface. The point I am making is more subtle. Although users can build on top of a cloud provider, they can never build abstractions that are first class like a cloud provider can.

To explain what first class abstractions are, I like to contrast the experience building infrastructure as an AWS engineer vs an engineer that uses AWS. Yes, high level services like Lambda build on top of the same EC2 APIs offered to customers. But there is more to building infrastructure than calling APIs. It comes down to the architecture, the design and practices that AWS built over time to deliver multi-tenant infrastructure at scale. We can see the results in intangible concepts such as shuffle sharding and static availability as well as concrete internal libraries and tooling to build control planes. From the outside, we may be able to use the same services but do not have all the other tools needed to create infrastructure of equivalent quality.

To be fair to Amazon and other cloud providers, what I am asking for can be considered their competitive advantage, and on top of that would be incredibly difficult to open-source. In effect, it is not enough to have accessible APIs. A new cloud would need to be architected from the ground up to expose the primitives needed to create first class abstractions.

It’s All In The API

When Google started the Kubernetes project, Amazon had a multi-year head start on building their cloud business. Without a game changing innovation, Google Cloud faces a massive uphill battle just to catch up. Kubernetes is the answer, and it aims to level the playing field by acting as the first and ultimate layer developers use to manage infrastructure.

This vision is not exactly a secret. Kelsey Hightower alludes to this goal quite often, and Daniel Smith lays out an ambitious future where the Kubernetes API can manage everything from the virtual (like virtual machines) to physical (like routers). Why is it possible now with Kubernetes? It comes back to the idea of making first class abstractions much easier to create. The API, which can be better described as the Kubernetes Resource Model, provides primitives and enforces standards that can be used to model any infrastructure type.

The Kubernetes API starts by modeling all data in terms of resources. The requirements enforced on resources are quite strict. Resource representations must all follow a standard structure (top level fields of apiVersion, kind, metadata, and spec) and must support API verbs with consistent behaviour no matter the resource (post, put, get, delete, etc.). In return for extensive standardization on all resources, the system as a whole gains massive leverage. Kubernetes tooling and libraries work consistently on all types without the need for customization per resource. Much of the specialized work on resources that would bloat the API system instead moves to other components. The Kubernetes Resource Model is a series of tradeoffs for the sake of complexity management. The API server can support so many different types of resources because each resource is consistent and does not require new, unique code to support. Each resource remains a small piece that can be worked on from outside components, which means Kubernetes can support a vast number of different resource types at a time.

Kubernetes trades in hard API standards and in return greatly simplifies defining new resources (declarative representations of things we want to manage through Kubernetes). What can be done with an API designed this way? We model all of our infrastructure as resources. Kubernetes already comes with a core set of resources around running workloads inside a cluster. Custom Resource Definitions and Controllers take things to the next level by enabling third-parties to extend Kubernetes with new resource types. With Custom Resources, we can create our own first class abstractions, which can range from building new types from existing resources (exactly how Deployments build on Pods and ReplicaSets) to defining cloud services (like AWS S3 buckets, Azure Cosmos DB, etc.) as a resource Kubernetes understands. Thanks to the standardized resource structure, controllers can be agnostic to the differences between resources as well. For example, a controller that checks labels on resources for policy compliance does not need to know what each resource does, the structure is consistent and the controller can look at the same place for each object.

Readers may think since I said earlier that the vision of Kubernetes is centered around the API instead of container orchestration that the container part does not matter much at all. In fact, containers are critical to why the API is so extensible. To be more specific, it is the ability to easily run arbitrary computation against API resources (in the form of Controllers) that drives most of the value of Kubernetes as an orchestrator. Controllers can range from handling smaller tasks (annotating resources with labels, cleaning up resources) to full-fledged Operators that manage the full lifecycle of an application. Similar to API Resources, Kubernetes simplifies registering new controllers, enabling new behaviour to work on API types without extending the Kubernetes codebase.

Defining Control Planes

It is a good idea to define what a control plane means here, especially in this newer context of software infrastructure. The original concept came out of network routing where the control plane tracks and manages the network topology and the packet routing rules while the data plane actively processes network requests. Marc Brooker generalizes the concept further:

  1. Data Plane components sit directly on the request path. The components are required to be up for a request to succeed and scale linearly with the number of requests to the system.

  2. Control Plane components help the data plane do its work. Responsibilities include resource management (adding and removing resources), Failure tolerance (observing and correcting hardware/software failures), and deployments (changing the system over time). These meta tasks require a full view of the resource topology and scale sublinearly compared to components on the request path. Since the control plane is not required for fulfilling requests the system can break for some period of time without affecting the data plane.

Control planes are a foundational building block of modern cloud computing design, underpinning infrastructure offerings spanning virtual machines, containers, cloud functions, databases, buckets, DNS, etc. For their importance to cloud infrastructure, I find that in practice they are not discussed often among cloud practitioners. I think it is because control planes are an implementation detail when building new kinds of infrastructure. Users are shielded from the complexities of managing control planes (thanks to declarative APIs) but they also lose out on exposure to the primitives cloud providers use to build their own services. This aspect is where Kubernetes is changing the state of play. Calling back to an earlier topic, Kubernetes provides the primitives for building control planes as first-class abstractions. Kubernetes democratizes infrastructure service management and creates a space where third-party cloud resources can be managed and used alongside community driven service implementations.

Controlling Systems Through Loops

What are the mechanisms that make control planes so well suited for infrastructure management? Colm MacCárthaigh dives into (1 2) how he views control planes through the lens of control theory and how AWS applies these principles to build infrastructure services. Control theory is a wide field that cuts across multiple industries, here are the key concepts that we need to understand for infrastructure management:

  1. Declare Desired State: Systems are not static, responding to external and internal conditions and changing in response. A stable system requires a known state to drive towards.

  2. Control Loops: The core primitive that binds the control and data planes, the loop consists of observing data plane components, determining if stabilizing action needs to be taken, then applying the change back to the data plane. Like the systems they observe, control loops are continuous, a feedback mechanism that constantly corrects divergence from the desired state.

How does Kubernetes apply control plane principles? The Kubernetes API is all about the management of declarative state, how to create/update it and then converging the system to that state. As for control loops, well, it is right there in the documentation. Kubernetes innovates on this idea by creating generic machinery to manage all resources as potential control planes. Kubernetes provides components inside Controllers like Informers and Workqueues that implement the tricky logic for building control loops.

Today, application developers leverage a catalog of systems to build their own products. If an application needs to store binary blobs, they can write files to an object storage service. If an application needs to manage a stream of events, they can integrate a message queue service in the application flow. Like a set of building blocks, applications compose infrastructure services that manage complex problems so they do not have to from scratch. Kubernetes offers a similar set of building blocks, but for the developers that build core infrastructure for others.

There is an observation that removing friction from an action leads to that action being done much more frequently. Kubernetes is headed to that same outcome with control planes. I think we will see control planes used to manage many more kinds of software. Systems are dynamic, and control loops are a powerful way to manage this reality. The bottleneck was always the implementation complexity, and Kubernetes is making moves to make this part tractable through Controllers and Operators paired with the Kubernetes Resource Model.

A Universal Control Plane

The common theme threading through the Kubernetes architecture is the idea of standardization. The Kubernetes Resource Model leans heavily on enforcing consistency in how resources are structured and tightly constrains the allowed access paths to the data. In return for the limitations on the data model [1] the ecosystem gains leverage and can use the same primitives across many different abstractions. The primitives also implement the core principles to cloud computing: Resources are layered to create higher level abstractions (like how Deployments abstract Pods), and Controllers allow resources to be declarative by pushing the implementation complexity to the underlying code.

If there are any takeaways to be had, the idea of a universal control plane depends more on Kubernetes API design than container orchestration. Yes, without containers, Kubernetes is not much more than a database with a specific schema; It is the controllers that give resources meaning. But what makes the project more than a container orchestrator is that we can model abstractions by the data that defines it, then manage and change abstractions over time with consistent tooling and powerful control loops.

Over time, we will see the Kubernetes project evolve in ways that may be surprising. I will expand on my own ideas on what the future of Kubernetes will look like in the next post, Kubernetes Unbundling. I will leave you with a question: What exactly can we do now that we have a Universal Control Plane?

Footnotes

[1] There are real costs to this architecture; All parts of the system have overhead in having to call the Kubernetes API and continuously update the data model. There is also an extreme lack of encapsulation to the resource model as well (See this thread by Adam Jacob on Kubernetes’ lack of a rich inner data model)