In theses series of the blog, I will show you how to “dockerize”, “kubernify” and monitor an application.

In the first part, we made an app that comprises of an nginx server serving a “Hello World” page.

In the 2nd part, we setup our minikube cluster and hosted our app on it.

In this 3rd part, we'll see how we can set up our app to be fault-tolerant and highly available.

0. Prerequisites

  • A Kubernetes cluster (in my case, it's a minikube cluster v1.5.2).
  • A running application on this Kubernetes cluster.

1. Describing the problems

First of all, let's differentiate 2 types of downtime: expected ones and unexpected ones.

Expected downtime are when cluster administrators are doing some maintenance work on some nodes that require a restart for example.

Unexpected downtime are when your application crashes for some reason (an OutOfMemory exception for example).

To tackle this downtime nightmare, we'll use 2 Kubernetes resources :

  • PodDisruptionBudget : To tackle expected downtime. This will instruct the cluster to make sure a number of replicas of our app is always running.
  • Deployment : To tackle unexpected downtime, This will allow us to spawn a specific number of replicas of our application, so if one of them crashes, we're sure there are other replicas running. Less than what we would've wanted but it's better than zero replicas.

2. Deployment is the new Pod

In a production environment, you should never a Pod resource on its own. It should always be spawned by what we call controller resources. The one you use the most is the Deployment resource.

Quoting the official doc:

A Deployment provides declarative updates for Pods and ReplicaSets. You describe a desired state in a Deployment, and the Deployment Controller changes the actual state to the desired state at a controlled rate.

ie. we will need to delete our manually created pod kubectl delete pod nhw-pod and create a Deployment resource that will manage pods for us.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nhw-depl
  labels:
    name: nhw-depl
spec:
  replicas: 3
  selector:
    matchLabels:
      name: nhw-pod
  template:
    metadata:
      labels:
        name: nhw-pod
    spec:
      containers:
      - name: nhw-container
        image: nhw
        ports:
        - containerPort: 80
        imagePullPolicy: IfNotPresent
❯ kubectl apply -f deployment.yml
deployment.apps/nhw-depl created
❯ kubectl get deploy
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
nhw-depl   3/3     3            3           4m40s
❯ kubectl get replicasets
NAME                  DESIRED   CURRENT   READY   AGE
nhw-depl-6c4669f4fc   3         3         3       4m46s
❯ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
nhw-depl-6c4669f4fc-m2xrm   1/1     Running   0          4m52s
nhw-depl-6c4669f4fc-mkll4   1/1     Running   0          4m52s
nhw-depl-6c4669f4fc-tvrkg   1/1     Running   0          4m52s

When the Deployment resource is fully rolled out, it creates a ReplicaSet that will control the number of Pod replicas, which in turn will create 3 Pods as declared in the YAML file.

If a Pod crashes or we delete it manually, Kubernetes will automatically spin up a new replacement Pod 😎:

❯ kubectl delete pod nhw-depl-6c4669f4fc-m2xrm
pod "nhw-depl-6c4669f4fc-m2xrm" deleted
❯ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
nhw-depl-6c4669f4fc-mkll4   1/1     Running   0          6m51s
nhw-depl-6c4669f4fc-tvrkg   1/1     Running   0          6m51s
nhw-depl-6c4669f4fc-tzvnc   1/1     Running   0          7s

3. Do not disturb! 🚫

First, let's create a PodDisruptionBudget resource that'll make sure at least 1 pod is running, using minAvailable: 1

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: nhw-pdb
spec:
  minAvailable: 1
  selector:
    matchLabels:
      name: nhw-pod

Here, our resource is called nhw-pdb, and it only applies to pods that have the label name equal to nhw-pod.

Apply it with kubectl apply -f pdb.yml.

❯ kubectl apply -f pdb.yml
poddisruptionbudget.policy/nhw-pdb created
❯ kubectl get pdb
NAME      MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
nhw-pdb   1               N/A               0                     4h54m

Now, to actually see the action we would need to drain the node on which our nhw-pods are scheduled by running kubectl drain minikube --ignore-daemonsets --force.

Draining will empty the node from the containers running on it. Kubernetes will have to schedule the containers on another node.

However, since we only have a single node cluster, it's not possible to reschedule containers somewhere else, and the draining will not finish.

Actually, since we configured our PodDisruptionBudget to require at least 1 running container, Kubernetes will first evict 2 pods from our node, try to schedule them somewhere else, and when and only when one of the new containers will be in the Running state, will it evict the 3rd & last pod from our node.

In our case, it loops forever, since we only have one node:

❯ kubectl drain minikube --ignore-daemonsets --force
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-2h7d7
...
evicting pod "nhw-depl-6c4669f4fc-tzvnc"
error when evicting pod "nhw-depl-6c4669f4fc-tzvnc" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "nhw-depl-6c4669f4fc-tzvnc"
error when evicting pod "nhw-depl-6c4669f4fc-tzvnc" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "nhw-depl-6c4669f4fc-tzvnc"
error when evicting pod "nhw-depl-6c4669f4fc-tzvnc" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
evicting pod "nhw-depl-6c4669f4fc-tzvnc"
...
❯ k get pods -owide
NAME                        READY   STATUS    RESTARTS   AGE     IP           NODE       NOMINATED NODE   READINESS GATES
nhw-depl-6c4669f4fc-8k5qr   0/1     Pending   0          4m47s   <none>       <none>     <none>           <none>
nhw-depl-6c4669f4fc-q5882   0/1     Pending   0          4m47s   <none>       <none>     <none>           <none>
nhw-depl-6c4669f4fc-tzvnc   1/1     Running   0          22m     172.17.0.2   minikube   <none>           <none>

We can see that the 2 new rescheduled pods can not find any node to be scheduled on, so they're in the Pending state. The 3rd pod is still in the Running state as long as the PodDisruptionBudget is protecting it.

You can now cancel the draining by hitting ctrl+c and uncordon our single node to enable the scheduling on it. 🔗

❯ kubectl get nodes
NAME       STATUS                     ROLES    AGE   VERSION
minikube   Ready,SchedulingDisabled   master   66d   v1.16.2
❯ kubectl uncordon minikube
node/minikube uncordoned
❯ kubectl get nodes
NAME       STATUS   ROLES    AGE   VERSION
minikube   Ready    master   66d   v1.16.2

And that's done, our application is now fully protected against expected and unexpected downtime.

Next time your Kubernetes administrator is performing maintenance on the nodes, you'll be like: It's fine