A deep dive into Kubernetes controllers

Kubernetes runs a group of controllers that take care of routine tasks to ensure the desired state of the cluster matches the observed state. For example, Replica Sets maintains a correct number of pods running in the cluster. Node Controller looks up the state of servers and responds when servers go down. Basically, each controller is responsible for a particular resource in the Kubernetes world. For users to manage their cluster, it is important that users understand the role of each controller in Kubernetes. However, have you ever thought about how the Kubernetes controllers work? Or even more exciting, have you ever thought about writing your own custom controller?

This is the first post of a series of two in which I would like to give an overview of the internals of a Kubernetes controller, its essential components and how it works. In the second post, I will guide you through a practice on how to write your own controller to build a simple notification system for your cluster.

All blocks of code I use in this post are exposed from current implementation of Kubernetes controllers which are all written in Golang and based on the client-go library.

Controller pattern

The best explanation of the Kubernetes controller can be found on the Kubernetes official documentation webpage:

In applications of robotics and automation, a control loop is a non-terminating loop that regulates the state of the system. In Kubernetes, a controller is a control loop that watches the shared state of the cluster through the API server and makes changes attempting to move the current state towards the desired state. Examples of controllers that ship with Kubernetes today are the replication controller, endpoints controller, namespace controller, and serviceaccounts controller.

Kubernetes official documentation, Kube-controller-manager

To reduce complexity, all controllers are packaged and shipped in a single daemon named kube-controller-manager. The simplest implementation of a controller is a loop:

for {
  desired := getDesiredState()
  current := getCurrentState()
  makeChanges(desired, current)
}

Controller components

There are two main components of a controller: Informer/SharedInformer and Workqueue. Informer/SharedInformer watches for changes on the current state of Kubernetes objects and sends events to Workqueue where events are then popped up by worker(s) to process.

Informer

The vital role of a Kubernetes controller is to watch objects for the desired state and the actual state, then send instructions to make the actual state be more like the desired state.In order to retrieve an object's information, the controller sends a request to Kubernetes API server.

However, repeatedly retrieving information from the API server can become expensive. Thus, in order to get and list objects multiple times in code, Kubernetes developers end up using cache which has already been provided by the client-go library. Additionally, the controller doesn't really want to send requests continuously. It only cares about events when the object has been created, modified or deleted. The client-go library provides the Listwatcher interface that performs an initial list and starts a watch on a particular resource:

lw := cache.NewListWatchFromClient(
      client,
      &v1.Pod{},
      api.NamespaceAll,
      fieldSelector)

All of these things are consumed in Informer. A general structure of an Informer is described below:

store, controller := cache.NewInformer {
    &cache.ListWatch{},
    &v1.Pod{},
    resyncPeriod,
    cache.ResourceEventHandlerFuncs{},

Although Informer has not been used much in the current Kubernetes (instead SharedInformer is used which I will explain later), it is still an essential concept to understand especially when you want to write a custom controller. The following are the three patterns used to construct the Informer:

ListWatcher

Listwatcher is a combination of a list function and a watch function for a specific resource in a specific namespace. This helps the controller focus only on the particular resource that it wants to look at. The field selector is a type of filter which narrows down the result of searching a resource like the controller wants to retrieve resource matching a specific field. The structure of a Listwatcher is described below:

cache.ListWatch {
    listFunc := func(options metav1.ListOptions) (runtime.Object, error) {
        return client.Get().
            Namespace(namespace).
            Resource(resource).
            VersionedParams(&options, metav1.ParameterCodec).
            FieldsSelectorParam(fieldSelector).
            Do().
            Get()
    }
    watchFunc := func(options metav1.ListOptions) (watch.Interface, error) {
        options.Watch = true
        return client.Get().
            Namespace(namespace).
            Resource(resource).
            VersionedParams(&options, metav1.ParameterCodec).
            FieldsSelectorParam(fieldSelector).
            Watch()
    }
}

Resource Event Handler

Resource Event Handler is where the controller handles notifications for changes on a particular resource:

type ResourceEventHandlerFuncs struct {
    AddFunc    func(obj interface{})
    UpdateFunc func(oldObj, newObj interface{})
    DeleteFunc func(obj interface{})
}
  • AddFunc is called when a new resource is created.
  • UpdateFunc is called when an existing resource is modified. The oldObj is the last known state of the resource. UpdateFunc is also called when a re-synchronization happens, and it gets called even if nothing changes.
  • DeleteFunc is called when an existing resource is deleted. It gets the final state of the resource (if it is known). Otherwise, it gets an object of type DeletedFinalStateUnknown. This can happen if the watch is closed and misses the delete event and the controller doesn't notice the deletion until the subsequent re-list.

ResyncPeriod

ResyncPeriod defines how often the controller goes through all items remaining in the cache and fires the UpdateFunc again. This provides a kind of configuration to periodically verify the current state and make it like the desired state.

It's extremely useful in the case where the controller may have missed updates or prior actions failed. However, if you build a custom controller, you must be careful with the CPU load if the period time is too short.

SharedInformer

The informer creates a local cache of a set of resources only used by itself. But, in Kubernetes, there is a bundle of controllers running and caring about multiple kinds of resources. This means that there will be an overlap - one resource is being cared by more than one controller.

In this case, the SharedInformer helps to create a single shared cache among controllers. This means cached resources won't be duplicated and by doing that, the memory overhead of the system is reduced. Besides, each SharedInformer only creates a single watch on the upstream server, regardless of how many downstream consumers are reading events from the informer. This also reduces the load on the upstream server. This is common for the kube-controller-manager which has so many internal controllers.

The SharedInformer has already provided hooks to receive notifications of adding, updating and deleting a particular resource. It also provides convenience functions for accessing shared caches and determining when a cache is primed. This saves us connections against the API server, duplicate serialization costs server-side, duplicate deserialization costs controller-side, and duplicate caching costs controller-side.

lw := cache.NewListWatchFromClient(…)
sharedInformer := cache.NewSharedInformer(lw, &api.Pod{}, resyncPeriod)

Workqueue

The SharedInformer can't track where each controller is up to (because it's shared), so the controller must provide its own queuing and retrying mechanism (if required). Hence, most Resource Event Handlers simply place items onto a per-consumer workqueue.

Whenever a resource changes, the Resource Event Handler puts a key to the Workqueue. A key uses the format <resource_namespace>/<resource_name> unless <resource_namespace> is empty, then it's just <resource_name>. By doing that, events are collapsed by key so each consumer can use worker(s) to pop key up and this work is done sequentially. This will guarantee that no two workers will work on the same key at the same time.

Workqueue is provided in the client-go library at client-go/util/workqueue. There are several kinds of queues supported including the delayed queue, timed queue and rate limiting queue.

The following is an example for creating a rate limiting queue:

queue :=
workqueue.NewRateLimitingQueue(workqueue.DefaultControllerRateLimiter())

Workqueue provides convenience functions to manage keys. The following figure describes the key's life-cycle in Workqueue:

Key's life-cycle in Workqueue

In the case of failure when processing an event, the controller calls the AddRateLimited() function to push it's key back to the workqueue to work on later with a predefined number of retries. Otherwise, if the process is successful, the key can be removed from the workqueue by calling the Forget() function. However, that function only stops the workqueue from tracking the history of the event. In order to remove the event completely from the workqueue, the controller must trigger the Done() function.

So the workqueue can handle notifications from cache, but the question is, when should the controller start workers processing the workqueue? There are two reasons that the controller should wait until the cache is completely synchronized in order to achieve the latest states:

  1. Listing all the resources will be inaccurate until the cache has finished synchronising.

  2. Multiple rapid updates to a single resource will be collapsed into the latest version by the cache/queue. Therefore, it must wait until the cache becomes idle before actually processing items to avoid wasted work on intermediate states.

The pseudo-code below describes that practice:

controller.informer = cache.NewSharedInformer(...)
controller.queue = workqueue.NewRateLimitingQueue(workqueue.DefaultControllerRateLimiter())

controller.informer.Run(stopCh)

if !cache.WaitForCacheSync(stopCh, controller.HasSynched)
{
    log.Errorf("Timed out waiting for caches to sync"))
}

// Now start processing
controller.runWorker()

What's next

So far I just gave you an overview of Kubernetes controllers: what it is, what it is used for, which components it's constructed from and how it really works. The most exciting thing is Kubernetes let users integrate their own controller. The second part is more fun where I show you a use case of a custom controller and guide you in writing it in few lines of code:

Let's move on!

Want to reach the next level in Kubernetes?

This tutorial is part of the series

Understand Kubernetes Controller

Learn about the elements involved in a Kubernetes controller and how to write a custom controller in Kubernetes.