Kubernetes (Part 2)

Setup

We will be using a virtual machine in the faculty's cloud.

When creating a virtual machine in the Launch Instance window:

Name your VM using the following convention: cc_lab<no>_<username>, where <no> is the lab number and <username> is your institutional account.
Select Boot from image in Instance Boot Source section
Select CC 2024-2025 in Image Name section
Select the m1.xlarge flavor.

In the virtual machine:

Download the laboratory archive from here in the work directory. Use: wget https://repository.grid.pub.ro/cs/cc/laboratoare/lab-kubernetes-part-2.zip to download the archive.
Extract the archive.
Run the setup script bash lab-kubernetes-part-2.sh.

$ # create the working dir
$ mkdir ~/work
$ # change the working dir
$ cd ~/work
$ # download the archive
$ wget https://repository.grid.pub.ro/cs/cc/laboratoare/lab-kubernetes-part-2.zip
$ unzip lab-kubernetes-part-2.zip
$ # run setup script; it may take a while
$ bash lab-kubernetes-part-2.sh

Create a local Kubernetes cluster using kind create cluster:

student@lab-kubernetes:~$ kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.23.4) 🖼
 ✓ Preparing nodes 📦
 ✓ Writing configuration 📜
 ✓ Starting control-plane 🕹️
 ✓ Installing CNI 🔌
 ✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:

kubectl cluster-info --context kind-kind

Thanks for using kind! 😊

Liveness probes

Software applications, no matter how well written and tested, are alyways prone to errors, crashes, deadlocks etc. Sometimes, the only way to restore functionality is to restart the application.

When running in production, it is very important that application errors are detected as soon as they occur and then automatically mitigated.

In Kubernetes, we have the concept of liveness probes, which help us by continuously monitoring a container and taking an action if a failure occurs.

Setup: a crashy app

To illustrate the concept, we will use an app that was specially built for this lab. The app is a simple HTTP server written in Python, that runs normally for a specified number of seconds, and starts to return errors after that.

If you are curious, you can find the source code in ~/work/crashy-app/server.py. The time after the app starts to error out is defined by the CRASH_AFTER environment variable.

The docker image for this app should already exist:

student@cc-lab:~/work$ docker image ls
REPOSITORY     TAG        IMAGE ID       CREATED             SIZE
crashy-app     1.0.0      f0a327e2fc35   56 minutes ago      67MB
[...]

Let's load this image into the Kind cluster:

student@cc-lab:~/work$ kind load docker-image crashy-app:1.0.0
Image: "crashy-app:1.0.0" with ID "sha256:f0a327e2fc354173521a6425d679e3adaa95de11ca3b8e5306e8b58655f310e4" not yet present on node "kind-control-plane", loading...

We will create a deployment for this app and apply it. Notice that the CRASH_AFTER environment variable will be set to 60 seconds.

student@cc-lab:~/work$ cat crashy-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: crashy-app
  labels:
    app: crashy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: crashy
  template:
    metadata:
      labels:
        app: crashy
    spec:
      containers:
      - name: crashy-app
        image: crashy-app:1.0.0
        ports:
        - containerPort: 80
        env:
        - name: CRASH_AFTER
          value: "60"

student@cc-lab:~/work$ kubectl apply -f crashy-deployment.yaml 
deployment.apps/crashy-app created

student@cc-lab:~/work$ kubectl get deployments
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
crashy-app   1/1     1            1           8s

student@cc-lab:~/work$ kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
crashy-app-5bc4d6474b-lgnk9   1/1     Running   0          11s

Let's expose the app via a service:

student@cc-lab:~/work$ cat crashy-service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: crashy-app
spec:
  type: NodePort
  selector:
    app: crashy
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
      nodePort: 30080

student@cc-lab:~/work$ kubectl apply -f crashy-service.yaml 
service/crashy-app created

student@cc-lab:~/work$ kubectl get services
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
crashy-app   NodePort    10.96.67.208   <none>        80:30080/TCP   6s
kubernetes   ClusterIP   10.96.0.1      <none>        443/TCP        24m

Notice that at the beggining, the app works normally:

student@cc-lab:~/work$ curl http://172.18.0.2:30080
Hi, my name is crashy-app-5bc4d6474b-lgnk9 and I'm a crashy app...
But I didn't crash... yet :D

After 60 seconds, it starts to return errors:

student@cc-lab:~/work$ curl http://172.18.0.2:30080
Hi, my name is crashy-app-5bc4d6474b-lgnk9 and I'm a crashy app...
I crashed 2.85 seconds ago, sorry about that :(

If you use curl -v, you will see that the server returns a HTTP 500 status code.

If we list the pods, we see the pod as running, so Kubernetes has no way to know that the app is not available.

The only way to recover is to delete the pod, which will force the deployment to create a new one. We can do this manually:

student@cc-lab:~/work$ kubectl delete pod/crashy-app-5bc4d6474b-lgnk9
pod "crashy-app-5bc4d6474b-lgnk9" deleted

student@cc-lab:~/work$ kubectl get pods
NAME                          READY   STATUS    RESTARTS   AGE
crashy-app-5bc4d6474b-2svb4   1/1     Running   0          9s

But we will have to keep doing this again and again, which is not convenient.

Defining a liveness probe

A liveness probe helps us by periodically polling for a condition. When the condition fails, the container is automatically restarted.

We will be using a httpGet probe, which queries an HTTP endpoint of the app. Most cloud-native apps have a separate endpoint for health monitoring, which is more lightweight (it doesn't perform the full processing, but only returns the status of the service).

Our crashy app responds to the /health endpoint, which can also be queried manually:

student@cc-lab:~/work$ curl http://172.18.0.2:30080/health
200 OK
[...]
student@cc-lab:~/work$ curl http://172.18.0.2:30080/health
500 Internal Server Error

Let's edit the deployment manifest by defining a liveness probe:

student@cc-lab:~/work$ cat crashy-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: crashy-app
  labels:
    app: crashy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: crashy
  template:
    metadata:
      labels:
        app: crashy
    spec:
      containers:
      - name: crashy-app
        image: crashy-app:1.0.0
        ports:
        - containerPort: 80
        env:
        - name: CRASH_AFTER
          value: "60"
        livenessProbe:
          httpGet:
            path: /health
            port: 80
          periodSeconds: 1
          failureThreshold: 3
          terminationGracePeriodSeconds: 1

note

The parameters have the following meaning:

httpGet.path - the path of the HTTP endpoint to probe
httpGet.port - the port of the HTTP endpoint to probe
periodSeconds - how many seconds to wait between two probes
failureThreshold - after how many failed probes is the container considered dead
terminationGracePeriodSeconds - how many seconds to wait before sending the KILL signal to a failed container

Apply the modified manifest:

student@cc-lab:~/work$ kubectl apply -f crashy-deployment.yaml 
deployment.apps/crashy-app configured

Visualize the events for the pod and observe that the container is periodically restarted after three consecutive failed probes:

student@cc-lab:~/work$ kubectl events --for pod/crashy-app-5799b6fd57-sd56v --watch
LAST SEEN   TYPE     REASON      OBJECT                            MESSAGE
23s         Normal   Scheduled   Pod/crashy-app-5799b6fd57-sd56v   Successfully assigned default/crashy-app-5799b6fd57-sd56v to kind-control-plane
22s         Normal   Pulled      Pod/crashy-app-5799b6fd57-sd56v   Container image "crashy-app:1.0.0" already present on machine
22s         Normal   Created     Pod/crashy-app-5799b6fd57-sd56v   Created container: crashy-app
22s         Normal   Started     Pod/crashy-app-5799b6fd57-sd56v   Started container crashy-app
0s          Warning   Unhealthy   Pod/crashy-app-5799b6fd57-sd56v   Liveness probe failed: HTTP probe failed with statuscode: 500
0s (x2 over 1s)   Warning   Unhealthy   Pod/crashy-app-5799b6fd57-sd56v   Liveness probe failed: HTTP probe failed with statuscode: 500
0s (x3 over 2s)   Warning   Unhealthy   Pod/crashy-app-5799b6fd57-sd56v   Liveness probe failed: HTTP probe failed with statuscode: 500
0s                Normal    Killing     Pod/crashy-app-5799b6fd57-sd56v   Container crashy-app failed liveness probe, will be restarted
0s (x2 over 65s)   Normal    Pulled      Pod/crashy-app-5799b6fd57-sd56v   Container image "crashy-app:1.0.0" already present on machine
0s (x2 over 65s)   Normal    Created     Pod/crashy-app-5799b6fd57-sd56v   Created container: crashy-app
0s (x2 over 65s)   Normal    Started     Pod/crashy-app-5799b6fd57-sd56v   Started container crashy-app
0s (x4 over 65s)   Warning   Unhealthy   Pod/crashy-app-5799b6fd57-sd56v   Liveness probe failed: HTTP probe failed with statuscode: 500
0s (x5 over 66s)   Warning   Unhealthy   Pod/crashy-app-5799b6fd57-sd56v   Liveness probe failed: HTTP probe failed with statuscode: 500
0s (x6 over 67s)   Warning   Unhealthy   Pod/crashy-app-5799b6fd57-sd56v   Liveness probe failed: HTTP probe failed with statuscode: 500
0s (x2 over 65s)   Normal    Killing     Pod/crashy-app-5799b6fd57-sd56v   Container crashy-app failed liveness probe, will be restarted
0s (x3 over 2m10s)   Normal    Pulled      Pod/crashy-app-5799b6fd57-sd56v   Container image "crashy-app:1.0.0" already present on machine
0s (x3 over 2m10s)   Normal    Created     Pod/crashy-app-5799b6fd57-sd56v   Created container: crashy-app
0s (x3 over 2m10s)   Normal    Started     Pod/crashy-app-5799b6fd57-sd56v   Started container crashy-app
[...]
^C

The number of restarts can also be seen in the pod list:

student@cc-lab:~/work$ kubectl get pods
NAME                          READY   STATUS    RESTARTS     AGE
crashy-app-5799b6fd57-sd56v   1/1     Running   3 (3s ago)   3m19s

Verify using curl that the app automatically recovers after a failure.

Readiness probes

Productions apps are often complex and are not ready to process traffic as soon as they are started. Usually, they need some time to initialize (seconds or even minutes). During the initialization time, traffic should not be routed to the respective instances, because it would not be processed anyway, and the users would see errors.

In Kubernetes, we have the concept of readiness probes which monitor a container and only accept traffic if they are ready.

Setup: a lazy app

To illustrate the concept, we will use an app that was specially built for this lab. The app is a simple HTTP server written in Python, that take a specified number of seconds to initialize, and runs normally after that.

If you are curious, you can find the source code in ~/work/lazy-app/server.py. The initialization time in seconds is a random number between zero and READY_AFTER_MAX.

The docker image for this app should already exist:

student@cc-lab:~/work$ docker image ls
REPOSITORY     TAG        IMAGE ID       CREATED             SIZE
lazy-app       1.0.0      f7eac9e4eda7   42 minutes ago      67MB
[...]

Let's load this image into the Kind cluster:

student@cc-lab:~/work$ kind load docker-image lazy-app:1.0.0
Image: "lazy-app:1.0.0" with ID "sha256:f7eac9e4eda7cc3b492cdfe6aff791cfd763567fb0502d5c8bb96cbc0cf032ed" not yet present on node "kind-control-plane", loading...

We will create a deployment for this app and apply it. Notice that the READY_AFTER_MAX environment variable will be set to 60 seconds. The deployment will have 5 replicas, which means that there will be 5 pods that can serve requests.

student@cc-lab:~/work$ cat lazy-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lazy-app
  labels:
    app: lazy
spec:
  replicas: 5
  selector:
    matchLabels:
      app: lazy
  template:
    metadata:
      labels:
        app: lazy
    spec:
      containers:
      - name: lazy-app
        image: lazy-app:1.0.0
        ports:
        - containerPort: 80
        env:
        - name: READY_AFTER_MAX
          value: "300"

student@cc-lab:~/work$ kubectl apply -f lazy-deployment.yaml 
deployment.apps/lazy-app created

student@cc-lab:~/work$ kubectl get deployments
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
lazy-app   5/5     5            5           8s

student@cc-lab:~/work$ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
lazy-app-674fb54b7d-9bckf   1/1     Running   0          4s
lazy-app-674fb54b7d-fsstv   1/1     Running   0          4s
lazy-app-674fb54b7d-hbsgg   1/1     Running   0          4s
lazy-app-674fb54b7d-tjddz   1/1     Running   0          4s
lazy-app-674fb54b7d-wxx7p   1/1     Running   0          4s

Let's expose the app via a service:

student@cc-lab:~/work$ cat lazy-service.yaml 
apiVersion: v1
kind: Service
metadata:
  name: lazy-app
spec:
  type: NodePort
  selector:
    app: lazy
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
      nodePort: 30081

student@cc-lab:~/work$ kubectl apply -f lazy-service.yaml 
service/lazy-app created

student@cc-lab:~/work$ kubectl get services
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1      <none>        443/TCP        52m
lazy-app     NodePort    10.96.180.27   <none>        80:30081/TCP   38m

We can see that all 5 instances are shown as "ready", but if we try to connect using curl, we don't always get successful responses:

student@cc-lab:~/work$ curl http://172.18.0.2:30081
Hi, my name is lazy-app-7c44789765-cdsvk and I'm a lazy app...
Getting ready... 24.81 more seconds please :D

student@cc-lab:~/work$ curl http://172.18.0.2:30081
Hi, my name is lazy-app-7c44789765-rrpcj and I'm a lazy app...
Getting ready... 119.54 more seconds please :D

student@cc-lab:~/work$ curl http://172.18.0.2:30081
Hi, my name is lazy-app-7c44789765-cdsvk and I'm a lazy app...
Getting ready... 17.34 more seconds please :D

student@cc-lab:~/work$ curl http://172.18.0.2:30081
Hi, my name is lazy-app-7c44789765-sn2sh and I'm a lazy app...
Getting ready... 184.09 more seconds please :D

student@cc-lab:~/work$ curl http://172.18.0.2:30081
Hi, my name is lazy-app-7c44789765-rrpcj and I'm a lazy app...
Getting ready... 110.19 more seconds please :D

student@cc-lab:~/work$ curl http://172.18.0.2:30081
Hi, my name is lazy-app-7c44789765-nvfkv and I'm a lazy app...
But I'm finally ready! :)

student@cc-lab:~/work$ curl http://172.18.0.2:30081
Hi, my name is lazy-app-7c44789765-sn2sh and I'm a lazy app...
Getting ready... 178.67 more seconds please :D

student@cc-lab:~/work$ curl http://172.18.0.2:30081
Hi, my name is lazy-app-7c44789765-nvfkv and I'm a lazy app...
But I'm finally ready! :)

Depending on the pod where the request is routed, we will see a successful or a failed response. Ideally, the service would only route requests to pods that are ready.

Defining a readiness probe

A readiness probe helps us by periodically polling for a condition. When the condition is successful, the container is automatically marked as ready.

Our lazy app responds to the /health endpoint, which can also be queried manually:

student@cc-lab:~/work$ curl http://172.18.0.2:30081/health
500 Internal Server Error
[...]
student@cc-lab:~/work$ curl http://172.18.0.2:30081/health
200 OK

First, let's delete the current deployment:

student@cc-lab:~/work$ kubectl delete deployments lazy-app
deployment.apps "lazy-app" deleted

Then, let's create a new deployment that defines a readiness probe:

student@cc-lab:~/work$ cat lazy-deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: lazy-app
  labels:
    app: lazy
spec:
  replicas: 5
  selector:
    matchLabels:
      app: lazy
  template:
    metadata:
      labels:
        app: lazy
    spec:
      containers:
      - name: lazy-app
        image: lazy-app:1.0.0
        ports:
        - containerPort: 80
        env:
        - name: READY_AFTER_MAX
          value: "300"
        readinessProbe:
          httpGet:
            path: /health
            port: 80
          periodSeconds: 1
          successThreshold: 2

note

The parameters have the following meaning:

httpGet.path - the path of the HTTP endpoint to probe
httpGet.port - the port of the HTTP endpoint to probe
periodSeconds - how many seconds to wait between two probes
successThreshold - after how many successful probes is the container considered ready

Apply the new manifest and observe that initially no pod is ready:

student@cc-lab:~/work$ kubectl apply -f lazy-deployment.yaml 
deployment.apps/lazy-app created

student@cc-lab:~/work$ kubectl get deployments
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
lazy-app   0/5     5            0           1s

student@cc-lab:~/work$ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
lazy-app-6d55bd7894-jnkm6   0/1     Running   0          2s
lazy-app-6d55bd7894-qt5mm   0/1     Running   0          2s
lazy-app-6d55bd7894-wsncf   0/1     Running   0          2s
lazy-app-6d55bd7894-zdhtv   0/1     Running   0          1s
lazy-app-6d55bd7894-zkxgm   0/1     Running   0          2s

Verify with curl that requests are only routed to pods that are ready:

student@cc-lab:~/work$ curl http://172.18.0.2:30081
Hi, my name is lazy-app-7c44789765-nvfkv and I'm a lazy app...
But I'm finally ready! :)

Eventually, they all become ready gradually.

You can observe that by listing the pods:

student@cc-lab:~/work$ kubectl get pods 
NAME                        READY   STATUS    RESTARTS   AGE
lazy-app-6d55bd7894-jnkm6   1/1     Running   0          41s
lazy-app-6d55bd7894-qt5mm   0/1     Running   0          41s
lazy-app-6d55bd7894-wsncf   0/1     Running   0          41s
lazy-app-6d55bd7894-zdhtv   1/1     Running   0          40s
lazy-app-6d55bd7894-zkxgm   0/1     Running   0          41s

[...]

student@cc-lab:~/work$ kubectl get pods 
NAME                        READY   STATUS    RESTARTS   AGE
lazy-app-6d55bd7894-jnkm6   1/1     Running   0          5m56s
lazy-app-6d55bd7894-qt5mm   1/1     Running   0          5m56s
lazy-app-6d55bd7894-wsncf   1/1     Running   0          5m56s
lazy-app-6d55bd7894-zdhtv   1/1     Running   0          5m55s
lazy-app-6d55bd7894-zkxgm   1/1     Running   0          5m56s

Or inspecting the deployment:

student@cc-la:~/work$ kubectl get deployments --watch
NAME       READY   UP-TO-DATE   AVAILABLE   AGE
lazy-app   0/5     5            0           2s
lazy-app   1/5     5            1           2m30s
lazy-app   2/5     5            2           2m36s
lazy-app   3/5     5            3           2m51s
lazy-app   4/5     5            4           3m38s
lazy-app   5/5     5            5           4m37s
^C

Scaling an app

In production, the amount of traffic for an app is rarely constant. If the traffic to our app increases, we may need to scale the app (create mode pods, identical to the ones that already exist).

Let's start with the hello-app with only one replica.

Create and apply the deployment:

student@lab-kubernetes:~$ cat hello-app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-app
  labels:
    app: hello
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: hello-app
        image: gitlab.cs.pub.ro:5050/scgc/cloud-courses/hello-app:1.0
        ports:
        - containerPort: 8080

student@lab-kubernetes:~$ kubectl apply -f hello-app-deployment.yaml
deployment.apps/hello-app created

student@lab-kubernetes:~$ kubectl get deployments
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
hello-app   1/1     1            1           13s

student@lab-kubernetes:~$ kubectl get pods
NAME                         READY   STATUS    RESTARTS   AGE
hello-app-599bb4bf7f-l45k4   1/1     Running   0          17s

Then create and apply the service that exposes the app:

student@lab-kubernetes:~$ cat hello-app-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: hello-app
spec:
  type: NodePort
  selector:
    app: hello
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080
      nodePort: 30082

student@lab-kubernetes:~$ kubectl apply -f hello-app-service.yaml
service/hello-app created

student@lab-kubernetes:~$ kubectl get services
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
hello-app    NodePort    10.96.186.102   <none>        8080:30082/TCP   7m42s
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP          20h

Now, let's scale hello-app to 10 pods. For this, change the value for replicas in hello-app-deployment.yaml to 10, and reapply the manifest:

student@lab-kubernetes:~$ kubectl apply -f hello-app-deployment.yaml
deployment.apps/hello-app configured

student@lab-kubernetes:~$ kubectl get pods
NAME                         READY   STATUS              RESTARTS   AGE
hello-app-599bb4bf7f-25w8g   1/1     Running             0          6s
hello-app-599bb4bf7f-7xzgr   0/1     ContainerCreating   0          5s
hello-app-599bb4bf7f-gr9xb   1/1     Running             0          6s
hello-app-599bb4bf7f-l45k4   1/1     Running             0          44m
hello-app-599bb4bf7f-mbgx7   0/1     ContainerCreating   0          6s
hello-app-599bb4bf7f-ps2dj   1/1     Running             0          6s
hello-app-599bb4bf7f-r6xqv   1/1     Running             0          6s
hello-app-599bb4bf7f-rrnws   0/1     ContainerCreating   0          5s
hello-app-599bb4bf7f-tnqtz   1/1     Running             0          6s
hello-app-599bb4bf7f-wh7qx   0/1     ContainerCreating   0          6s

After a while, you'll see that all 10 pods are running. Also, the deployment shows 10 available pods:

student@lab-kubernetes:~$ kubectl get deployments
NAME        READY   UP-TO-DATE   AVAILABLE   AGE
hello-app   10/10   10           10          45m

Replica sets

What actually happened is that a Kubernetes replica set associated with the deployment, of scale 10, was created:

student@lab-kubernetes:~$ kubectl get replicasets
NAME                   DESIRED   CURRENT   READY   AGE
hello-app-599bb4bf7f   10        10        10      1m

Testing the scaled app

Connect multiple times to the service, using curl. You will notice that each time, a different pod responds:

student@lab-kubernetes:~$ curl http://172.18.0.2:30082
Hello, world!
Version: 1.0.0
Hostname: hello-app-599bb4bf7f-r6xqv
student@lab-kubernetes:~$ curl http://172.18.0.2:30082
Hello, world!
Version: 1.0.0
Hostname: hello-app-599bb4bf7f-gr9xb
student@lab-kubernetes:~$ curl http://172.18.0.2:30082
Hello, world!
Version: 1.0.0
Hostname: hello-app-599bb4bf7f-rrnws
student@lab-kubernetes:~$ curl http://172.18.0.2:30082
Hello, world!
Version: 1.0.0
Hostname: hello-app-599bb4bf7f-7xzgr
student@lab-kubernetes:~$ curl http://172.18.0.2:30082
Hello, world!
Version: 1.0.0
Hostname: hello-app-599bb4bf7f-ps2dj

Autoscaling

In production, it's unfeasible to manually scale up and down an application. Instead, we need a solution that automatically does this, as resource demands are changing.

In Kubernetes, we have the concept of horizontal pod autoscaler, which adds or removes pods from a replica set based on resource usage.

Defining resource constrains

Remove the current `hello-app1 deployment, if any:

First, let's delete the current deployment:

student@cc-lab:~/work$ kubectl delete deployments hello-app
deployment.apps "hello-app" deleted

Then, let's create and apply a new deployment that defines resource constraints:

student@cc-lab:~/work$ cat hello-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-app
  labels:
    app: hello
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: hello-app
        image: gitlab.cs.pub.ro:5050/scgc/cloud-courses/hello-app:1.0
        ports:
        - containerPort: 8080
        resources:
          limits:
            cpu: 200m
          requests:
            cpu: 100m

student@cc-lab$ kubectl apply -f hello-deployment.yaml 
deployment.apps/hello-app created

note

The parameters have the following meaning:

resources.requests.cpu - minimum resources requested by the container (0.1 CPU cores in this case)
resources.limits.cpu - maximum resources requested by the container (0.2 CPU cores in this case)

Installing the metrics server

In order for Kubernetes to measure resource utilization, we must install the metrics server, which is not installed by default in Kind.

We will use Helm, which is a package manager for Kubernetes.

Using Helm will be the scope of another lab. For now, run the following commands to install the metrics server:

student@cc-lab:~/work$ helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
student@cc-lab:~/work$ helm repo update
student@cc-lab:~/work$ helm upgrade --install --set args={--kubelet-insecure-tls} metrics-server metrics-server/metrics-server --namespace kube-system

Defining the autoscaling policy

Now, let's define and apply the horizontal cpu autoscaler:

student@cc-lab:~/work$ cat hello-autoscaler.yaml 
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: hello
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hello-app
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 5
 
student@cc-lab:~/work$ kubectl apply -f hello-autoscaler.yaml
horizontalpodautoscaler.autoscaling/hello created

note

The parameters have the following meaning:

minreplicas - minimum replicas to scale down to
maxReplicas - maximum replicas to scale up to
averageUtilization - when to scale; in this case, when the average CPU load across pods is greater than 5%

Also, inspect the horizontal pod autoscaler:

student@cc-lab:~/work$ kubectl get hpa
NAME    REFERENCE              TARGETS              MINPODS   MAXPODS   REPLICAS   AGE
hello   Deployment/hello-app   cpu: 0%/5%   1         10        1          2m30s

note

The values set for resource limits and average utilization are unrealisticaly low, but we did this to be able to generate load.

Generating load

Open another terminal and run a while loop that sends curl requests:

student@cc-lab:~$ while true; do curl http://172.18.0.2:30082/; sleep 0.01; done

In the first terminal, inspect the horizontal pod autoscaler:

student@cc-lab:~/work$ kubectl get hpa --watch
NAME    REFERENCE              TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
hello   Deployment/hello-app   cpu: 0%/5%   1         10        1          20m
hello   Deployment/hello-app   cpu: 2%/5%   1         10        1          21m
hello   Deployment/hello-app   cpu: 18%/5%   1         10        1          21m
hello   Deployment/hello-app   cpu: 16%/5%   1         10        4          21m
hello   Deployment/hello-app   cpu: 5%/5%    1         10        4          22m
hello   Deployment/hello-app   cpu: 4%/5%    1         10        4          23m
[...]

Observe how additional replicas have been automatically created:

student@cc-lab:~/work$ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
hello-app-f447d7765-72sp8   1/1     Running   0          2m10s
hello-app-f447d7765-bwllr   1/1     Running   0          6m3s
hello-app-f447d7765-jr8kx   1/1     Running   0          2m10s
hello-app-f447d7765-v7lnq   1/1     Running   0          2m10s

Stopping the load

Stop the while loop from the other terminal. Continue to inspect the horizontal pod autoscaler:

student@cc-lab:~/work$ kubectl get hpa --watch
NAME    REFERENCE              TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
hello   Deployment/hello-app   cpu: 5%/5%   1         10        4          23m
hello   Deployment/hello-app   cpu: 3%/5%   1         10        4          24m
hello   Deployment/hello-app   cpu: 0%/5%   1         10        4          25m
hello   Deployment/hello-app   cpu: 0%/5%   1         10        4          27m
hello   Deployment/hello-app   cpu: 1%/5%   1         10        4          27m
hello   Deployment/hello-app   cpu: 0%/5%   1         10        4          28m
hello   Deployment/hello-app   cpu: 1%/5%   1         10        4          29m
hello   Deployment/hello-app   cpu: 0%/5%   1         10        4          29m
hello   Deployment/hello-app   cpu: 0%/5%   1         10        4          29m
hello   Deployment/hello-app   cpu: 1%/5%   1         10        3          29m
hello   Deployment/hello-app   cpu: 0%/5%   1         10        1          30m
hello   Deployment/hello-app   cpu: 1%/5%   1         10        1          30m
hello   Deployment/hello-app   cpu: 0%/5%   1         10        1          31m

Notice that after a few minutes, the instances have been scaled down to 1:

student@cc-lab:~/work$ kubectl get pods
NAME                        READY   STATUS    RESTARTS   AGE
hello-app-f447d7765-bwllr   1/1     Running   0          14m

Exercise - fine tuning

Try to tune the:

resources parameters (resources.requests.cpu and resources.limits.cpu)
autoscaler parameter (averageUtilization)
the way you generate traffic

in order to reach the maximum number of 10 instances.

Ingress

Even if we can expose Kubernetes apps using services, each service runs on a different port. If we want a single point of acccess to all apps in the Kubernetes cluster, we can use an Ingress.

An Ingress is a Kubernetes object that acts like an API gateway. Each service can be accessed using a different HTTP resource path.

Setting up an ingress on the Kind cluster

Kind doesn't have the full Ingress functionality by default, so we have to install some dependencies.

First, let's install the Ingress controller functionality:

student@cc-lab:~/work$ kubectl apply -f https://kind.sigs.k8s.io/examples/ingress/deploy-ingress-nginx.yaml
namespace/ingress-nginx created
serviceaccount/ingress-nginx created
serviceaccount/ingress-nginx-admission created
role.rbac.authorization.k8s.io/ingress-nginx created
role.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrole.rbac.authorization.k8s.io/ingress-nginx created
clusterrole.rbac.authorization.k8s.io/ingress-nginx-admission created
rolebinding.rbac.authorization.k8s.io/ingress-nginx created
rolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx created
clusterrolebinding.rbac.authorization.k8s.io/ingress-nginx-admission created
configmap/ingress-nginx-controller created
service/ingress-nginx-controller created
service/ingress-nginx-controller-admission created
deployment.apps/ingress-nginx-controller created
job.batch/ingress-nginx-admission-create created
job.batch/ingress-nginx-admission-patch created
ingressclass.networking.k8s.io/nginx created
validatingwebhookconfiguration.admissionregistration.k8s.io/ingress-nginx-admission created

Then, we must install the cloud provider add-on for Kind. Go to https://github.com/kubernetes-sigs/cloud-provider-kind/releases and download the archive for Linux AMD64 architecture.

Extract it, and run the executable in a different terminal. Keep it running.

student@cc-lab-alexandru-carp:~$ wget https://github.com/kubernetes-sigs/cloud-provider-kind/releases/download/v0.6.0/cloud-provider-kind_0.6.0_linux_amd64.tar.gz
[...]

student@cc-lab-alexandru-carp:~$ tar -xvf cloud-provider-kind_0.6.0_linux_amd64.tar.gz 
LICENSE
README.md
cloud-provider-kind

student@cc-lab-alexandru-carp:~$ ./cloud-provider-kind
[...]

Configuring another service

Let's configure another service, similar to the hello-app one, so that the ingress will route traffic to both services. This time, we will use hello-app:2.0 image.

Create and apply the second deployment:

student@cc-lab:~/work$ cat hello-deployment-v2.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-app-v2
  labels:
    app: hello-v2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: hello-v2
  template:
    metadata:
      labels:
        app: hello-v2
    spec:
      containers:
      - name: hello-app
        image: gitlab.cs.pub.ro:5050/scgc/cloud-courses/hello-app:2.0
        ports:
        - containerPort: 8080
        resources:
          limits:
            cpu: 200m
          requests:
            cpu: 100m

student@cc-lab:~/work$ kubectl apply -f hello-deployment-v2.yaml 
deployment.apps/hello-app-v2 created

student@cc-lab:~/work$ kubectl get deployments
NAME           READY   UP-TO-DATE   AVAILABLE   AGE
hello-app      1/1     1            1           54m
hello-app-v2   1/1     1            1           12s

student@cc-lab:~/work$ kubectl get pods
NAME                            READY   STATUS    RESTARTS   AGE
hello-app-f447d7765-bwllr       1/1     Running   0          35m
hello-app-v2-5b9fbc5465-wr6nr   1/1     Running   0          5s

Create apply the second service:

student@cc-lab:~/work$ cat hello-service-v2.yaml 
apiVersion: v1
kind: Service
metadata:
  name: hello-app-v2
spec:
  type: NodePort
  selector:
    app: hello-v2
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080
      nodePort: 30083

student@cc-lab:~/work$ kubectl apply -f hello-service-v2.yaml 
service/hello-app-v2 created

student@cc-lab:~/work$ kubectl get services
NAME           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
hello-app      NodePort    10.96.172.199   <none>        8080:30082/TCP   55m
hello-app-v2   NodePort    10.96.33.190    <none>        8080:30083/TCP   3s
kubernetes     ClusterIP   10.96.0.1       <none>        443/TCP          162m

Defining the ingress

We will define an ingress so that:

/v1 path will point to the service for hello-app:1.0
/v2 path will point to the service for hello-app:2.0

student@cc-lab:~/work$ cat hello-ingress.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: hello-ingress
spec:
  rules:
  - http:
      paths:
      - pathType: Prefix
        path: /v1
        backend:
          service:
            name: hello-app
            port:
              number: 8080
      - pathType: Prefix
        path: /v2
        backend:
          service:
            name: hello-app-v2
            port:
              number: 8080

student@cc-lab:~/work$ kubectl apply -f hello-ingress.yaml 
ingress.networking.k8s.io/hello-ingress configured

student@cc-lab:~/work$ kubectl describe ingress hello-ingress
Name:             hello-ingress
Labels:           <none>
Namespace:        default
Address:          
Ingress Class:    <none>
Default backend:  <default>
Rules:
  Host        Path  Backends
  ----        ----  --------
  *           
              /v1   hello-app:8080 (10.244.0.50:8080)
              /v2   hello-app-v2:8080 (10.244.0.54:8080)
Annotations:  <none>
Events:       <none>

Testing traffic routing

For identifying the IP address associated to the ingress, we must inspect the services in the ingress-nginx namespace:

student@cc-lab:~/work$ kubectl get --namespace ingress-nginx services
NAME                                 TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx-controller             LoadBalancer   10.96.75.175   172.18.0.3    80:32191/TCP,443:32729/TCP   5m45s
ingress-nginx-controller-admission   ClusterIP      10.96.116.85   <none>        443/TCP

In our case, the IP address is 172.18.0.3.

Let's test traffic routing with curl:

student@cc-lab:~/work$ curl 172.18.0.3/v1
Hello, world!
Version: 1.0.0
Hostname: hello-app-f447d7765-bwllr

student@cc-lab:~/work$ curl 172.18.0.3/v2
Hello, world!
Version: 2.0.0
Hostname: hello-app-v2-5b9fbc5465-wr6nr

Setup​

Liveness probes​

Setup: a crashy app​

Defining a liveness probe​

Readiness probes​

Setup: a lazy app​

Defining a readiness probe​

Scaling an app​

Replica sets​

Testing the scaled app​

Autoscaling​

Defining resource constrains​

Installing the metrics server​

Defining the autoscaling policy​

Generating load​

Stopping the load​

Exercise - fine tuning​

Ingress​

Setting up an ingress on the Kind cluster​

Configuring another service​

Defining the ingress​

Testing traffic routing​

Setup

Liveness probes

Setup: a crashy app

Defining a liveness probe

Readiness probes

Setup: a lazy app

Defining a readiness probe

Scaling an app

Replica sets

Testing the scaled app

Autoscaling

Defining resource constrains

Installing the metrics server

Defining the autoscaling policy

Generating load

Stopping the load

Exercise - fine tuning

Ingress

Setting up an ingress on the Kind cluster

Configuring another service

Defining the ingress

Testing traffic routing