Docker to Kubernetes [PART 3]: Completely avoid downtime in a Kubernetes cluster with PodDisruptionBudgets and Deployments
In theses series of the blog, I will show you how to “dockerize”, “kubernify” and monitor an application.
In the first part, we made an app that comprises of an nginx server serving a “Hello World” page.
In the 2nd part, we setup our minikube cluster and hosted our app on it.
In this 3rd part, we'll see how we can set up our app to be fault-tolerant and highly available.
- A Kubernetes cluster (in my case, it's a minikube cluster v1.5.2).
- A running application on this Kubernetes cluster.
1. Describing the problems
First of all, let's differentiate 2 types of downtime: expected ones and unexpected ones.
Expected downtime are when cluster administrators are doing some maintenance work on some nodes that require a restart for example.
Unexpected downtime are when your application crashes for some reason (an OutOfMemory exception for example).
To tackle this downtime nightmare, we'll use 2 Kubernetes resources :
PodDisruptionBudget: To tackle expected downtime. This will instruct the cluster to make sure a number of replicas of our app is always running.
Deployment: To tackle unexpected downtime, This will allow us to spawn a specific number of replicas of our application, so if one of them crashes, we're sure there are other replicas running. Less than what we would've wanted but it's better than zero replicas.
2. Deployment is the new Pod
In a production environment, you should never a Pod resource on its own. It should always be spawned by what we call controller resources. The one you use the most is the Deployment resource.
Quoting the official doc:
A Deployment provides declarative updates for Pods and ReplicaSets. You describe a desired state in a Deployment, and the Deployment Controller changes the actual state to the desired state at a controlled rate.
ie. we will need to delete our manually created pod
kubectl delete pod nhw-pod and create a Deployment resource that will manage pods for us.
apiVersion: apps/v1 kind: Deployment metadata: name: nhw-depl labels: name: nhw-depl spec: replicas: 3 selector: matchLabels: name: nhw-pod template: metadata: labels: name: nhw-pod spec: containers: - name: nhw-container image: nhw ports: - containerPort: 80 imagePullPolicy: IfNotPresent
❯ kubectl apply -f deployment.yml deployment.apps/nhw-depl created ❯ kubectl get deploy NAME READY UP-TO-DATE AVAILABLE AGE nhw-depl 3/3 3 3 4m40s ❯ kubectl get replicasets NAME DESIRED CURRENT READY AGE nhw-depl-6c4669f4fc 3 3 3 4m46s ❯ kubectl get pods NAME READY STATUS RESTARTS AGE nhw-depl-6c4669f4fc-m2xrm 1/1 Running 0 4m52s nhw-depl-6c4669f4fc-mkll4 1/1 Running 0 4m52s nhw-depl-6c4669f4fc-tvrkg 1/1 Running 0 4m52s
When the Deployment resource is fully rolled out, it creates a ReplicaSet that will control the number of Pod replicas, which in turn will create 3 Pods as declared in the YAML file.
If a Pod crashes or we delete it manually, Kubernetes will automatically spin up a new replacement Pod 😎:
❯ kubectl delete pod nhw-depl-6c4669f4fc-m2xrm pod "nhw-depl-6c4669f4fc-m2xrm" deleted ❯ kubectl get pods NAME READY STATUS RESTARTS AGE nhw-depl-6c4669f4fc-mkll4 1/1 Running 0 6m51s nhw-depl-6c4669f4fc-tvrkg 1/1 Running 0 6m51s nhw-depl-6c4669f4fc-tzvnc 1/1 Running 0 7s
3. Do not disturb! 🚫
First, let's create a
PodDisruptionBudget resource that'll make sure at least 1 pod is running, using
apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: nhw-pdb spec: minAvailable: 1 selector: matchLabels: name: nhw-pod
Here, our resource is called
nhw-pdb, and it only applies to pods that have the label
name equal to
Apply it with
kubectl apply -f pdb.yml.
❯ kubectl apply -f pdb.yml poddisruptionbudget.policy/nhw-pdb created ❯ kubectl get pdb NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE nhw-pdb 1 N/A 0 4h54m
Now, to actually see the action we would need to drain the node on which our
nhw-pods are scheduled by running
kubectl drain minikube --ignore-daemonsets --force.
Draining will empty the node from the containers running on it. Kubernetes will have to schedule the containers on another node.
However, since we only have a single node cluster, it's not possible to reschedule containers somewhere else, and the draining will not finish.
Actually, since we configured our PodDisruptionBudget to require at least 1 running container, Kubernetes will first evict 2 pods from our node, try to schedule them somewhere else, and when and only when one of the new containers will be in the Running state, will it evict the 3rd & last pod from our node.
In our case, it loops forever, since we only have one node:
❯ kubectl drain minikube --ignore-daemonsets --force WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-proxy-2h7d7 ... evicting pod "nhw-depl-6c4669f4fc-tzvnc" error when evicting pod "nhw-depl-6c4669f4fc-tzvnc" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod "nhw-depl-6c4669f4fc-tzvnc" error when evicting pod "nhw-depl-6c4669f4fc-tzvnc" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod "nhw-depl-6c4669f4fc-tzvnc" error when evicting pod "nhw-depl-6c4669f4fc-tzvnc" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod "nhw-depl-6c4669f4fc-tzvnc" ... ❯ k get pods -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nhw-depl-6c4669f4fc-8k5qr 0/1 Pending 0 4m47s <none> <none> <none> <none> nhw-depl-6c4669f4fc-q5882 0/1 Pending 0 4m47s <none> <none> <none> <none> nhw-depl-6c4669f4fc-tzvnc 1/1 Running 0 22m 172.17.0.2 minikube <none> <none>
We can see that the 2 new rescheduled pods can not find any node to be scheduled on, so they're in the Pending state. The 3rd pod is still in the Running state as long as the PodDisruptionBudget is protecting it.
You can now cancel the draining by hitting ctrl+c and uncordon our single node to enable the scheduling on it. 🔗
❯ kubectl get nodes NAME STATUS ROLES AGE VERSION minikube Ready,SchedulingDisabled master 66d v1.16.2 ❯ kubectl uncordon minikube node/minikube uncordoned ❯ kubectl get nodes NAME STATUS ROLES AGE VERSION minikube Ready master 66d v1.16.2
And that's done, our application is now fully protected against expected and unexpected downtime.
Next time your Kubernetes administrator is performing maintenance on the nodes, you'll be like: