OpenShift

Setup

We will be using a virtual machine in the faculty's cloud.

When creating a virtual machine in the Launch Instance window:

Name your VM using the following convention: cc_lab<no>_<username>, where <no> is the lab number and <username> is your institutional account.
Select Boot from image in Instance Boot Source section
Select CC 2024-2025 in Image Name section
Select the m1.xlarge flavor.

In the base virtual machine:

Download the laboratory archive from here. Use: wget https://repository.grid.pub.ro/cs/cc/laboratoare/lab-openshift.zip to download the archive.
Extract the archive.
Run the setup script bash lab-openshift.sh.

$ # download the archive
$ wget https://repository.grid.pub.ro/cs/cc/laboratoare/lab-openshift.zip
$ unzip lab-openshift.zip
$ # run setup script; it may take a while
$ bash lab-openshift.sh

Running Applications on OpenShift

Connecting to OpenShift

OpenShift is Red Hat's container application platform that provides a secure and scalable foundation for building, deploying, and managing containerized applications. It's a Kubernetes distribution with added features for enterprise use, including automated operations, developer workflows, and comprehensive security capabilities. OpenShift extends Kubernetes with developer-focused tools that make it easier to manage applications throughout their lifecycle.

The UPB OpenShift deployment is at the following link: https://console-openshift-console.apps.ocp-demo.grid.pub.ro This deployment is used to that you can you can us a real-world cluster deployment, with its limitations and advantages.

You will be running commands inside of the OpenShift cloud using its CLI-specific tool called oc. The oc command offers a superset of the kubectl command, meaning that we can consider it an alias from the point of view of Cloud Computing.

A user has to login using the CLI in order to use the oc command. We have to generate a token which will help us login. To create the token we have to connect to the OpenShift dashboard: https://console-openshift-console.apps.ocp-demo.grid.pub.ro

From the OpenShift dashboard we have to press the button containing our names and select the Copy login command. You will press the Display token link in the next page, which will display a command that you will have to copy paste in your terminal. The command looks like this:

sergiu@epsilon:~/cc-workspace/curs-08$ oc login --token=sha256~asdlfkjhlkadsf23hj4l --server=https://api.ocp-demo.grid.pub.ro:6443
WARNING: Using insecure TLS client config. Setting this option is not supported!

Logged into "https://api.ocp-demo.grid.pub.ro:6443" as "sergiu.weisz" using the token provided.

You don't have any projects. You can try to create a new project, by running

    oc new-project <projectname>

Instead of referring to namespaces directly, OpenShift users the concept of projects. To create a namespace for ourselves in the infrastructure, we have to run the following command:

sergiu@epsilon:~/ocp/upgrade$ oc new-project sergiu-weisz-openshift
Now using project "sergiu-weisz-openshift" on server "https://api.ocp-demo.grid.pub.ro:6443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app rails-postgresql-example

to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:

    kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.43 -- /agnhost serve-hostname

To switch to a namespace we can use the oc command as follows:

sergiu@epsilon:~/cc-workspace/curs-08$ oc project sergiu-weisz-prj
Now using project "sergiu-weisz-prj" on server "https://api.ocp-demo.grid.pub.ro:6443".

Deploying Ollama on OpenShift

Ollama is a tool which provides easy access to LLM which can be run on our own private or public clouds instead of SaaS infrastructures. The advantages of running an LLM locally are as follows:

not be paying subscription fees for the service
you can use already available hardware with no added cost
all your queries stay locally, nothing will be reported or added to any online profile of you

Together with Ollama we will be deploying the Open WebUI, a dashbord which connects to the running Ollama instance and provides a friendly user interface to run queries.

We will be addapting the following tutorial to run on our OpenShift cluster: https://gautam75.medium.com/deploy-ollama-and-open-webui-on-openshift-c88610d3b5c7. We will not be using it directly, because we do not wish to allocate PersistentVolumes for a temporary use case such as a lab context.

We will be deploying the Ollama pods together with a service which will be receiving the queries. Apply the following manifest to your cluster:

sergiu@epsilon:~/cc-workspace/curs-09/ollama$ cat ollama.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - containerPort: 11434
        volumeMounts:
        - name: ollama-data
          mountPath: /.ollama
        tty: true
      volumes:
      - name: ollama-data
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: ollama
spec:
  ports:
  - protocol: TCP
    port: 11434
    targetPort: 11434
  selector:
    app: ollama
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc apply -f ollama.yaml
deployment.apps/ollama created
service/ollama created

We will be checking if the deployment has been created and that the pod has been launched:

sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get deployment
NAME     READY   UP-TO-DATE   AVAILABLE   AGE
ollama   1/1     1            1           50s
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get pods
NAME                      READY   STATUS    RESTARTS   AGE
ollama-76f696875f-6svtp   1/1     Running   0          59s
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get services
NAME     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
ollama   ClusterIP   172.30.141.96   <none>        11434/TCP   5m26s

We can now interact with the service by port forwarding it to our local machine and sending a curl to it:

sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc port-forward svc/ollama 11434:11434 &
Forwarding from 127.0.0.1:11434 -> 11434
Forwarding from [::1]:11434 -> 11434
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ curl localhost:11434
Ollama is running

We will be interacting with Ollama through the CLI by running commands directly in the container using the command bellow.

sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc exec -it ollama-bb4ff999c-5w9fk -- /bin/bash
groups: cannot find name for group ID 1000800000
1000800000@ollama-bb4ff999c-5w9fk:/$ ollama pull llama3.2:3b

The ollama command is used inside the pod to pull a model as follows:

1000800000@ollama-bb4ff999c-5w9fk:/$ ollama pull llama3.2:3b
pulling manifest
<...>
verifying sha256 digest
writing manifest
success
1000800000@ollama-bb4ff999c-5w9fk:/$ ollama list
NAME           ID              SIZE      MODIFIED
llama3.2:3b    a80c4f17acd5    2.0 GB    15 minutes ago

We used the ollama list command above to see how much disk our model is using.

While we can interact with Ollama, we want to use a GUI application to make it easier to run queries and to offer the application to other users. We will be using the Open WebUI project which will be configured to connect to the Ollama service configured earlier:

sergiu@epsilon:~/cc-workspace/curs-09/ollama$ cat open-webui.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: open-webui
spec:
  replicas: 1
  selector:
    matchLabels:
      app: open-webui
  template:
    metadata:
      labels:
        app: open-webui
    spec:
      containers:
      - name: open-webui
        image: ghcr.io/open-webui/open-webui:main
        ports:
        - containerPort: 8080
        env:
        - name: OLLAMA_BASE_URL
          value: "http://ollama:11434"
        - name: WEBUI_SECRET_KEY
          value: "your-secret-key"
        volumeMounts:
        - name: webui-data
          mountPath: /app/backend/data
      volumes:
      - name: webui-data
        emptyDir: {}
      restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
  name: open-webui
spec:
  ports:
  - protocol: TCP
    port: 8080
    targetPort: 8080
  selector:
    app: open-webui
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc apply -f open-webui.yaml
deployment.apps/open-webui created
service/open-webui created
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get deployment
NAME         READY   UP-TO-DATE   AVAILABLE   AGE
ollama       1/1     1            1           30m
open-webui   1/1     1            1           118s
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get pods
NAME                          READY   STATUS    RESTARTS   AGE
ollama-bb4ff999c-5w9fk        1/1     Running   0          27m
open-webui-7584f79cb6-wdqqz   1/1     Running   0          2m2s

To connect from the outside world to our OpenWebUI we can create a route, which can be accessed externally. An OpenShift route works like a Ingress in regular Kubernetes, it creates a HTTP ingress point which will redirect traffic from the router to a selected service. We will be creating the following route:

sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc create route edge --service open-webui
route.route.openshift.io/open-webui created
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get routes.route.openshift.io open-webui -o json | jq -r '.spec.host' | sed 's/^/https:\/\//'
https://open-webui-sergiu-weisz-prj.apps.ocp-demo.grid.pub.ro

The last command gives us the link which from which we can access the Open WebUI. Configure the connection and try it out!

DIY: DeepSeek R1 7b

After testing the Open WebIO, download the DeepSeek R1 7b quantized model for ollama. You can search the Ollama library for it: https://ollama.com/library

You can download the model using the same command as above. You do not need to create a new deployment.

Scheduling Jobs

In the context of cloud computing up until now we have only interracted with applications or services whose lifetime is infinite, which means that they are started and they are never stopped unless an error apears.

This does not cover most use cases in distributed computing though. In many cases processing steps are handled in distinct chunks which are launched, and executed by a scheduler. Kubernetes by its nature works as a scheduler for jobs, which makes it well suited for scheduling processing jobs.

A Kubernetes job would be used instead of a Pod when we expect that the action will finish and we do not want the resources of a Pod to be lingering in a cluster. We have noticed from the liveness probes lab what when a Pod stops it doesn't just shutdown, it can be restarted indefinitely, which does not match our dedicate workload mode.

The object which manages a discrete work item in Kubernetes is called a Job and it contains a specification for a container, as we are used to from Pod specifications.

The example bellow displays a job which displays a debug message:

apiVersion: batch/v1
kind: Job
metadata:
  name: hello-world-job
spec:
  template:
    spec:
      containers:
      - name: hello-world
        image: ghcr.io/containerd/busybox
        command: ["echo", "Hello from Kubernetes batch job!"]
      restartPolicy: Never
  backoffLimit: 4

When applying the above manifest, we can see that the Job is created, and we can inspect its output as follows:

sergiu@epsilon:~/cc-workspace/curs-09$ oc apply -f hello-world.yaml
oc get jobsjob.batch/hello-world-job created
sergiu@epsilon:~/cc-workspace/curs-09$ oc get jobs
NAME              COMPLETIONS   DURATION   AGE
hello-world-job   0/1           0s         0s
sergiu@epsilon:~/cc-workspace/curs-09$ oc logs job/hello-world-job
Hello from Kubernetes batch job!

The above example is useful for quick and dirty jobs, but when running in an actual batch environment there are some other factors which have to be involved:

the increase scheduling accuracy and system cohesion you would add resource limits;
use a custom job script;
add fail conditions;
limit job duration.

The following example is used for creating a complex job which runs a custom python script, limits its resources and requests a restart of the application fails:

apiVersion: batch/v1
kind: Job
metadata:
  name: matrix-multiplication-job
spec:
  template:
    spec:
      containers:
      - name: matrix-multiply
        image: gitlab.cs.pub.ro:5050/scgc/cloud-courses/python:3.9-slim
        command: ["bash", "-c"]
        args:
        - |
          pip install numpy && python /scripts/matrix_multiply.py
        volumeMounts:
        - name: script-volume
          mountPath: /scripts
        - name: pip-local
          mountPath: /.local
        - name: pip-local
          mountPath: /.cache
        resources:
          requests:
            cpu: "2"
            memory: "4Gi"
          limits:
            cpu: "4"
            memory: "8Gi"
      volumes:
      - name: script-volume
        configMap:
          name: matrix-multiplication-script
      - name: pip-local
        emptyDir: {}
      restartPolicy: OnFailure
  backoffLimit: 2
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: matrix-multiplication-script
data:
  matrix_multiply.py: |
    import numpy as np
    import time
    import os

    # Create large matrices
    size = 5000
    print(f'Creating {size}x{size} matrices...')
    a = np.random.rand(size, size)
    b = np.random.rand(size, size)

    # Perform CPU-intensive matrix multiplication
    print('Starting matrix multiplication...')
    start_time = time.time()
    result = np.matmul(a, b)
    duration = time.time() - start_time

    print(f'Matrix multiplication complete in {duration:.2f} seconds')
    print(f'Result matrix shape: {result.shape}')

The requests dict is used for scheduling purposes, it is used as a minimum resource specification used for the container when choosing a node for placement. The limits dict is used to specify the actual limits imposed on the container which it can't surpass. As with a regular Pod, ConfigMaps, Secrets and other kubernetes objects can be mounted into the container.

Let's run it and see its output:

sergiu@epsilon:~/cc-workspace/curs-09$ oc logs job/matrix-multiplication-job
Collecting numpy
  Downloading numpy-2.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.5/19.5 MB 101.7 MB/s eta 0:00:00
Installing collected packages: numpy
Successfully installed numpy-2.0.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 23.0.1 -> 25.1.1
[notice] To update, run: pip install --upgrade pip
Creating 5000x5000 matrices...
Starting matrix multiplication...
Matrix multiplication complete in 14.20 seconds
Result matrix shape: (5000, 5000)

Case study: zip cracking

Let's look at a real world example of cracking a password using fcrackzip and jobs in Kubernetes. The decrypt-zip.yaml is the basis for our job. It contains the commands used for cracking the password for a zip file. The fcrackzip tool can brute-force a ZIP archive's password.

Our task is to download the archive, and crack its password.

The following manifest will define our job and Persistent Volume:

apiVersion: batch/v1
kind: Job
metadata:
  name: zip-decryption-job
  labels:
    app: zip-decryption
spec:
  ttlSecondsAfterFinished: 86400  # Automatically delete job 24h after completion
  backoffLimit: 2  # Number of retries before considering job failed
  template:
    metadata:
      labels:
        app: zip-decryption
    spec:
      restartPolicy: OnFailure
      initContainers:
      - name: download-zip
        image: ghcr.io/curl/curl-container/curl:master   # Lightweight curl image
        command: ["/bin/sh", "-c"]
        volumeMounts:
        - name: data-volume
          mountPath: /data
        args:
        - >
          echo "Downloading ZIP file from remote source..." &&
          curl http://swarm.cs.pub.ro/~sweisz/encrypted.zip -o /data/encrypted.zip
      containers:
      - name: hashcat-container
        image: gitlab.cs.pub.ro:5050/scgc/cloud-courses/fcrackzip  # Replace with appropriate hashcat image
        command: ["/bin/sh"]
        args:
        - "-c"
        - >
          cd /data &&
          fcrackzip -v -b -c a -l 5-5 -u encrypted.zip > results_lowercase.txt &&
          cat results_lowercase.txt
        volumeMounts:
        - name: data-volume
          mountPath: /data
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "4"
      volumes:
      - name: data-volume
        emptyDir: {}
      - name: wordlist-volume
        configMap:
          name: zip-decrypt-config

We know that the file has a password made up of 5 letters, which led us to use the -l 5-5 option, together with -b to do brute-forcing. We use the initContainer to download the archive and the main container to run fcrackzip.

Exercise: Crack using wordlist

Change the above job in order to run fcrackzip using the wordlist from the following link: http://swarm.cs.pub.ro/~sweisz/wordlist.txt. You can attach the wordlist as a ConfigMap as you've seen in the matrix multiplication example. You can see how to configure fcrackzip to use wordlists in the following link: https://sohvaxus.github.io/content/fcrackzip-bruteforce-tutorial.html.

Cronjobs

While regular Jobs are useful from a scheduling point of view, they cannot be set to run periodically or on a set timer. CronJobs are a mechanism implemented in Kubernetes to enhance the regular Jobs feature. They are a type of Job which are managed and scheduled by Kubernetes to run at a specific time based on a user-defined rule.

Some use cases which we can define for CronJobs are:

scheduling regular data exports or backups to off-site facilities
periodic environment cleanup jobs, for example deleting temporary files or files which have been generated and haven't been used for some time
crawling endpoint for new data or information

The following is an example manifest for a job:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: first-job
spec:
  schedule: "0 2 8 * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: first-job
            image: busybox
            command: ["echo", "First job"]
          restartPolicy: OnFailure

The jobTemplate specification works as a job specification field, in which we add the requirements for a job.

The schedule value is specified using the following convention from the cron manual:

# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').

This means that the above job will run on the 8th day of the month at 2:00 AM. If we want to specify a job which would run for every minute we could to the following chane:

-  schedule: "0 2 8 * *"
+  schedule: "*/1 * * * *"

The */x means the job will run every x minutes.

For an easy way to define the cron schedule, you can use https://crontab.guru/.

Case study: Database backup

For this case study we will pe running a PostgreSQL defined by the following manifest:

# PostgreSQL Pod
apiVersion: v1
kind: Pod
metadata:
  name: postgres-db
  labels:
    app: postgres
spec:
  containers:
  - name: postgres
    image: gitlab.cs.pub.ro:5050/scgc/cloud-courses/postgres:14-alpine
    ports:
    - containerPort: 5432
      name: postgres
    env:
    - name: PGDATA
      value: /var/lib/postgresql/data/pg/
    - name: POSTGRES_USER
      valueFrom:
        secretKeyRef:
          name: postgres-credentials
          key: username
    - name: POSTGRES_PASSWORD
      valueFrom:
        secretKeyRef:
          name: postgres-credentials
          key: password
    - name: POSTGRES_DB
      valueFrom:
        secretKeyRef:
          name: postgres-credentials
          key: database
    volumeMounts:
    - name: postgres-data
      mountPath: /var/lib/postgresql/data/
  volumes:
  - name: postgres-data
    emptyDir: {}
---
# Service for PostgreSQL
apiVersion: v1
kind: Service
metadata:
  name: postgres-service
spec:
  ports:
  - port: 5432
    targetPort: 5432
  selector:
    app: postgres

The pgsql.yaml file deploys a database server. For this database server we need to create backups which will be storen in another volume which will them be deployed off-site.

In order to prepare the setup we first need to create the database that we will be creating. Run the following command to setup the database deployment and service in the lab directory:

oc apply -f pgsql.yaml

We will start from the followin already created CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup-container
            image: gitlab.cs.pub.ro:5050/scgc/cloud-courses/postgres:14-alpine
            command:
            - /bin/sh
            - -c
            - |
              # Set date format for backup filename
              BACKUP_DATE=$(date +\%Y-\%m-\%d-\%H\%M)

              # Create backup
              echo "Starting PostgreSQL backup at $(date)"
              mkdir /tmp/backups
              pg_dump \
                -h ${DB_HOST} \
                -U ${DB_USER} \
                -d ${DB_NAME} \
                -F custom \
                -Z 9 \
                -f /tmp/backups/${DB_NAME}-${BACKUP_DATE}.pgdump

            env:
            - name: DB_HOST
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: host
            - name: DB_USER
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: username
            - name: DB_NAME
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: database
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: password
          restartPolicy: OnFailure
---
# Secret for database credentials
apiVersion: v1
kind: Secret
metadata:
  name: postgres-credentials
type: Opaque
data:
  host: cG9zdGdyZXMtc2VydmljZQ==  # postgres-service (base64 encoded)
  username: YmFja3VwX3VzZXI=        # backup_user (base64 encoded)
  password: c2VjdXJlUGFzc3dvcmQxMjM= # securePassword123 (base64 encoded)
  database: cHJvZHVjdGlvbl9kYg==     # production_db (base64 encoded)

The above CronJob creates a backup of the database using pg_dump and puts it in a temporary location.

Apply them so we can see the backup in action.

sergiu@epsilon:~/ocp/upgrade$ oc get cronjobs
NAME              SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
postgres-backup   */1 * * * *   False     0        35s             39m

The issue with the above CronJob is that although it creates a backup file, it doesn't add it to any kind of persistent storage.

Create an emptyDir volume mount, mount it to the /backup path and change the backup script so that it copies the backup files to the backup volume.

Change the backup schedule so that it only does a backup every hour.

Change the policy so that it can only run one backup job in parallel. Look into the documentation so that you will not allow concurrent jobs: https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/.

Setup​

Running Applications on OpenShift​

Connecting to OpenShift​

Deploying Ollama on OpenShift​

DIY: DeepSeek R1 7b​

Scheduling Jobs​

Case study: zip cracking​

Exercise: Crack using wordlist​

Cronjobs​

Case study: Database backup​