OpenShift
Setup
We will be using a virtual machine in the faculty's cloud.
When creating a virtual machine in the Launch Instance window:
- Name your VM using the following convention:
cc_lab<no>_<username>
, where<no>
is the lab number and<username>
is your institutional account. - Select Boot from image in Instance Boot Source section
- Select CC 2024-2025 in Image Name section
- Select the m1.xlarge flavor.
In the base virtual machine:
- Download the laboratory archive from here.
Use:
wget https://repository.grid.pub.ro/cs/cc/laboratoare/lab-openshift.zip
to download the archive. - Extract the archive.
- Run the setup script
bash lab-openshift.sh
.
$ # download the archive
$ wget https://repository.grid.pub.ro/cs/cc/laboratoare/lab-openshift.zip
$ unzip lab-openshift.zip
$ # run setup script; it may take a while
$ bash lab-openshift.sh
Running Applications on OpenShift
Connecting to OpenShift
OpenShift is Red Hat's container application platform that provides a secure and scalable foundation for building, deploying, and managing containerized applications. It's a Kubernetes distribution with added features for enterprise use, including automated operations, developer workflows, and comprehensive security capabilities. OpenShift extends Kubernetes with developer-focused tools that make it easier to manage applications throughout their lifecycle.
The UPB OpenShift deployment is at the following link: https://console-openshift-console.apps.ocp-demo.grid.pub.ro This deployment is used to that you can you can us a real-world cluster deployment, with its limitations and advantages.
You will be running commands inside of the OpenShift cloud using its CLI-specific tool called oc
.
The oc
command offers a superset of the kubectl
command, meaning that we can consider it an alias from the point of view of Cloud Computing.
A user has to login using the CLI in order to use the oc
command.
We have to generate a token which will help us login.
To create the token we have to connect to the OpenShift dashboard: https://console-openshift-console.apps.ocp-demo.grid.pub.ro
From the OpenShift dashboard we have to press the button containing our names and select the Copy login command
.
You will press the Display token
link in the next page, which will display a command that you will have to copy paste in your terminal.
The command looks like this:
sergiu@epsilon:~/cc-workspace/curs-08$ oc login --token=sha256~asdlfkjhlkadsf23hj4l --server=https://api.ocp-demo.grid.pub.ro:6443
WARNING: Using insecure TLS client config. Setting this option is not supported!
Logged into "https://api.ocp-demo.grid.pub.ro:6443" as "sergiu.weisz" using the token provided.
You don't have any projects. You can try to create a new project, by running
oc new-project <projectname>
Instead of referring to namespaces directly, OpenShift users the concept of projects. To create a namespace for ourselves in the infrastructure, we have to run the following command:
sergiu@epsilon:~/ocp/upgrade$ oc new-project sergiu-weisz-openshift
Now using project "sergiu-weisz-openshift" on server "https://api.ocp-demo.grid.pub.ro:6443".
You can add applications to this project with the 'new-app' command. For example, try:
oc new-app rails-postgresql-example
to build a new example application in Ruby. Or use kubectl to deploy a simple Kubernetes application:
kubectl create deployment hello-node --image=registry.k8s.io/e2e-test-images/agnhost:2.43 -- /agnhost serve-hostname
To switch to a namespace we can use the oc
command as follows:
sergiu@epsilon:~/cc-workspace/curs-08$ oc project sergiu-weisz-prj
Now using project "sergiu-weisz-prj" on server "https://api.ocp-demo.grid.pub.ro:6443".
Deploying Ollama on OpenShift
Ollama is a tool which provides easy access to LLM which can be run on our own private or public clouds instead of SaaS infrastructures. The advantages of running an LLM locally are as follows:
- not be paying subscription fees for the service
- you can use already available hardware with no added cost
- all your queries stay locally, nothing will be reported or added to any online profile of you
Together with Ollama we will be deploying the Open WebUI, a dashbord which connects to the running Ollama instance and provides a friendly user interface to run queries.
We will be addapting the following tutorial to run on our OpenShift cluster: https://gautam75.medium.com/deploy-ollama-and-open-webui-on-openshift-c88610d3b5c7. We will not be using it directly, because we do not wish to allocate PersistentVolumes for a temporary use case such as a lab context.
We will be deploying the Ollama pods together with a service which will be receiving the queries. Apply the following manifest to your cluster:
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ cat ollama.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
spec:
replicas: 1
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- containerPort: 11434
volumeMounts:
- name: ollama-data
mountPath: /.ollama
tty: true
volumes:
- name: ollama-data
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: ollama
spec:
ports:
- protocol: TCP
port: 11434
targetPort: 11434
selector:
app: ollama
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc apply -f ollama.yaml
deployment.apps/ollama created
service/ollama created
We will be checking if the deployment has been created and that the pod has been launched:
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
ollama 1/1 1 1 50s
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get pods
NAME READY STATUS RESTARTS AGE
ollama-76f696875f-6svtp 1/1 Running 0 59s
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ollama ClusterIP 172.30.141.96 <none> 11434/TCP 5m26s
We can now interact with the service by port forwarding it to our local machine and sending a curl to it:
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc port-forward svc/ollama 11434:11434 &
Forwarding from 127.0.0.1:11434 -> 11434
Forwarding from [::1]:11434 -> 11434
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ curl localhost:11434
Ollama is running
We will be interacting with Ollama through the CLI by running commands directly in the container using the command bellow.
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc exec -it ollama-bb4ff999c-5w9fk -- /bin/bash
groups: cannot find name for group ID 1000800000
1000800000@ollama-bb4ff999c-5w9fk:/$ ollama pull llama3.2:3b
The ollama
command is used inside the pod to pull a model as follows:
1000800000@ollama-bb4ff999c-5w9fk:/$ ollama pull llama3.2:3b
pulling manifest
<...>
verifying sha256 digest
writing manifest
success
1000800000@ollama-bb4ff999c-5w9fk:/$ ollama list
NAME ID SIZE MODIFIED
llama3.2:3b a80c4f17acd5 2.0 GB 15 minutes ago
We used the ollama list
command above to see how much disk our model is using.
While we can interact with Ollama, we want to use a GUI application to make it easier to run queries and to offer the application to other users. We will be using the Open WebUI project which will be configured to connect to the Ollama service configured earlier:
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ cat open-webui.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: open-webui
spec:
replicas: 1
selector:
matchLabels:
app: open-webui
template:
metadata:
labels:
app: open-webui
spec:
containers:
- name: open-webui
image: ghcr.io/open-webui/open-webui:main
ports:
- containerPort: 8080
env:
- name: OLLAMA_BASE_URL
value: "http://ollama:11434"
- name: WEBUI_SECRET_KEY
value: "your-secret-key"
volumeMounts:
- name: webui-data
mountPath: /app/backend/data
volumes:
- name: webui-data
emptyDir: {}
restartPolicy: Always
---
apiVersion: v1
kind: Service
metadata:
name: open-webui
spec:
ports:
- protocol: TCP
port: 8080
targetPort: 8080
selector:
app: open-webui
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc apply -f open-webui.yaml
deployment.apps/open-webui created
service/open-webui created
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
ollama 1/1 1 1 30m
open-webui 1/1 1 1 118s
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get pods
NAME READY STATUS RESTARTS AGE
ollama-bb4ff999c-5w9fk 1/1 Running 0 27m
open-webui-7584f79cb6-wdqqz 1/1 Running 0 2m2s
To connect from the outside world to our OpenWebUI we can create a route, which can be accessed externally. An OpenShift route works like a Ingress in regular Kubernetes, it creates a HTTP ingress point which will redirect traffic from the router to a selected service. We will be creating the following route:
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc create route edge --service open-webui
route.route.openshift.io/open-webui created
sergiu@epsilon:~/cc-workspace/curs-09/ollama$ oc get routes.route.openshift.io open-webui -o json | jq -r '.spec.host' | sed 's/^/https:\/\//'
https://open-webui-sergiu-weisz-prj.apps.ocp-demo.grid.pub.ro
The last command gives us the link which from which we can access the Open WebUI. Configure the connection and try it out!
DIY: DeepSeek R1 7b
After testing the Open WebIO, download the DeepSeek R1 7b quantized model for ollama. You can search the Ollama library for it: https://ollama.com/library
You can download the model using the same command as above. You do not need to create a new deployment.
Scheduling Jobs
In the context of cloud computing up until now we have only interracted with applications or services whose lifetime is infinite, which means that they are started and they are never stopped unless an error apears.
This does not cover most use cases in distributed computing though. In many cases processing steps are handled in distinct chunks which are launched, and executed by a scheduler. Kubernetes by its nature works as a scheduler for jobs, which makes it well suited for scheduling processing jobs.
A Kubernetes job would be used instead of a Pod when we expect that the action will finish and we do not want the resources of a Pod to be lingering in a cluster. We have noticed from the liveness probes lab what when a Pod stops it doesn't just shutdown, it can be restarted indefinitely, which does not match our dedicate workload mode.
The object which manages a discrete work item in Kubernetes is called a Job
and it contains a specification for a container, as we are used to from Pod specifications.
The example bellow displays a job which displays a debug message:
apiVersion: batch/v1
kind: Job
metadata:
name: hello-world-job
spec:
template:
spec:
containers:
- name: hello-world
image: ghcr.io/containerd/busybox
command: ["echo", "Hello from Kubernetes batch job!"]
restartPolicy: Never
backoffLimit: 4
When applying the above manifest, we can see that the Job
is created, and we can inspect its output as follows:
sergiu@epsilon:~/cc-workspace/curs-09$ oc apply -f hello-world.yaml
oc get jobsjob.batch/hello-world-job created
sergiu@epsilon:~/cc-workspace/curs-09$ oc get jobs
NAME COMPLETIONS DURATION AGE
hello-world-job 0/1 0s 0s
sergiu@epsilon:~/cc-workspace/curs-09$ oc logs job/hello-world-job
Hello from Kubernetes batch job!
The above example is useful for quick and dirty jobs, but when running in an actual batch environment there are some other factors which have to be involved:
- the increase scheduling accuracy and system cohesion you would add resource limits;
- use a custom job script;
- add fail conditions;
- limit job duration.
The following example is used for creating a complex job which runs a custom python script, limits its resources and requests a restart of the application fails:
apiVersion: batch/v1
kind: Job
metadata:
name: matrix-multiplication-job
spec:
template:
spec:
containers:
- name: matrix-multiply
image: gitlab.cs.pub.ro:5050/scgc/cloud-courses/python:3.9-slim
command: ["bash", "-c"]
args:
- |
pip install numpy && python /scripts/matrix_multiply.py
volumeMounts:
- name: script-volume
mountPath: /scripts
- name: pip-local
mountPath: /.local
- name: pip-local
mountPath: /.cache
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
volumes:
- name: script-volume
configMap:
name: matrix-multiplication-script
- name: pip-local
emptyDir: {}
restartPolicy: OnFailure
backoffLimit: 2
---
apiVersion: v1
kind: ConfigMap
metadata:
name: matrix-multiplication-script
data:
matrix_multiply.py: |
import numpy as np
import time
import os
# Create large matrices
size = 5000
print(f'Creating {size}x{size} matrices...')
a = np.random.rand(size, size)
b = np.random.rand(size, size)
# Perform CPU-intensive matrix multiplication
print('Starting matrix multiplication...')
start_time = time.time()
result = np.matmul(a, b)
duration = time.time() - start_time
print(f'Matrix multiplication complete in {duration:.2f} seconds')
print(f'Result matrix shape: {result.shape}')
The requests
dict is used for scheduling purposes, it is used as a minimum resource specification used for the container when choosing a node for placement.
The limits
dict is used to specify the actual limits imposed on the container which it can't surpass.
As with a regular Pod, ConfigMaps, Secrets and other kubernetes objects can be mounted into the container.
Let's run it and see its output:
sergiu@epsilon:~/cc-workspace/curs-09$ oc logs job/matrix-multiplication-job
Collecting numpy
Downloading numpy-2.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (19.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.5/19.5 MB 101.7 MB/s eta 0:00:00
Installing collected packages: numpy
Successfully installed numpy-2.0.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
[notice] A new release of pip is available: 23.0.1 -> 25.1.1
[notice] To update, run: pip install --upgrade pip
Creating 5000x5000 matrices...
Starting matrix multiplication...
Matrix multiplication complete in 14.20 seconds
Result matrix shape: (5000, 5000)
Case study: zip cracking
Let's look at a real world example of cracking a password using fcrackzip and jobs in Kubernetes.
The decrypt-zip.yaml
is the basis for our job.
It contains the commands used for cracking the password for a zip file.
The fcrackzip
tool can brute-force a ZIP archive's password.
Our task is to download the archive, and crack its password.
The following manifest will define our job and Persistent Volume:
apiVersion: batch/v1
kind: Job
metadata:
name: zip-decryption-job
labels:
app: zip-decryption
spec:
ttlSecondsAfterFinished: 86400 # Automatically delete job 24h after completion
backoffLimit: 2 # Number of retries before considering job failed
template:
metadata:
labels:
app: zip-decryption
spec:
restartPolicy: OnFailure
initContainers:
- name: download-zip
image: ghcr.io/curl/curl-container/curl:master # Lightweight curl image
command: ["/bin/sh", "-c"]
volumeMounts:
- name: data-volume
mountPath: /data
args:
- >
echo "Downloading ZIP file from remote source..." &&
curl http://swarm.cs.pub.ro/~sweisz/encrypted.zip -o /data/encrypted.zip
containers:
- name: hashcat-container
image: gitlab.cs.pub.ro:5050/scgc/cloud-courses/fcrackzip # Replace with appropriate hashcat image
command: ["/bin/sh"]
args:
- "-c"
- >
cd /data &&
fcrackzip -v -b -c a -l 5-5 -u encrypted.zip > results_lowercase.txt &&
cat results_lowercase.txt
volumeMounts:
- name: data-volume
mountPath: /data
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "4"
volumes:
- name: data-volume
emptyDir: {}
- name: wordlist-volume
configMap:
name: zip-decrypt-config
We know that the file has a password made up of 5 letters, which led us to use the -l 5-5
option, together with -b
to do brute-forcing.
We use the initContainer
to download the archive and the main container to run fcrackzip
.
Exercise: Crack using wordlist
Change the above job in order to run fcrackzip
using the wordlist from the following link: http://swarm.cs.pub.ro/~sweisz/wordlist.txt.
You can attach the wordlist as a ConfigMap as you've seen in the matrix multiplication example.
You can see how to configure fcrackzip to use wordlists in the following link: https://sohvaxus.github.io/content/fcrackzip-bruteforce-tutorial.html.
Cronjobs
While regular Jobs are useful from a scheduling point of view, they cannot be set to run periodically or on a set timer. CronJobs are a mechanism implemented in Kubernetes to enhance the regular Jobs feature. They are a type of Job which are managed and scheduled by Kubernetes to run at a specific time based on a user-defined rule.
Some use cases which we can define for CronJobs are:
- scheduling regular data exports or backups to off-site facilities
- periodic environment cleanup jobs, for example deleting temporary files or files which have been generated and haven't been used for some time
- crawling endpoint for new data or information
The following is an example manifest for a job:
apiVersion: batch/v1
kind: CronJob
metadata:
name: first-job
spec:
schedule: "0 2 8 * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: first-job
image: busybox
command: ["echo", "First job"]
restartPolicy: OnFailure
The jobTemplate
specification works as a job specification field, in which we add the requirements for a job.
The schedule
value is specified using the following convention from the cron manual:
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').
This means that the above job will run on the 8th day of the month at 2:00 AM. If we want to specify a job which would run for every minute we could to the following chane:
- schedule: "0 2 8 * *"
+ schedule: "*/1 * * * *"
The */x
means the job will run every x
minutes.
For an easy way to define the cron schedule, you can use https://crontab.guru/.
Case study: Database backup
For this case study we will pe running a PostgreSQL defined by the following manifest:
# PostgreSQL Pod
apiVersion: v1
kind: Pod
metadata:
name: postgres-db
labels:
app: postgres
spec:
containers:
- name: postgres
image: gitlab.cs.pub.ro:5050/scgc/cloud-courses/postgres:14-alpine
ports:
- containerPort: 5432
name: postgres
env:
- name: PGDATA
value: /var/lib/postgresql/data/pg/
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: postgres-credentials
key: database
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data/
volumes:
- name: postgres-data
emptyDir: {}
---
# Service for PostgreSQL
apiVersion: v1
kind: Service
metadata:
name: postgres-service
spec:
ports:
- port: 5432
targetPort: 5432
selector:
app: postgres
The pgsql.yaml file deploys a database server. For this database server we need to create backups which will be storen in another volume which will them be deployed off-site.
In order to prepare the setup we first need to create the database that we will be creating. Run the following command to setup the database deployment and service in the lab directory:
oc apply -f pgsql.yaml
We will start from the followin already created CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: backup-container
image: gitlab.cs.pub.ro:5050/scgc/cloud-courses/postgres:14-alpine
command:
- /bin/sh
- -c
- |
# Set date format for backup filename
BACKUP_DATE=$(date +\%Y-\%m-\%d-\%H\%M)
# Create backup
echo "Starting PostgreSQL backup at $(date)"
mkdir /tmp/backups
pg_dump \
-h ${DB_HOST} \
-U ${DB_USER} \
-d ${DB_NAME} \
-F custom \
-Z 9 \
-f /tmp/backups/${DB_NAME}-${BACKUP_DATE}.pgdump
env:
- name: DB_HOST
valueFrom:
secretKeyRef:
name: postgres-credentials
key: host
- name: DB_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: username
- name: DB_NAME
valueFrom:
secretKeyRef:
name: postgres-credentials
key: database
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
restartPolicy: OnFailure
---
# Secret for database credentials
apiVersion: v1
kind: Secret
metadata:
name: postgres-credentials
type: Opaque
data:
host: cG9zdGdyZXMtc2VydmljZQ== # postgres-service (base64 encoded)
username: YmFja3VwX3VzZXI= # backup_user (base64 encoded)
password: c2VjdXJlUGFzc3dvcmQxMjM= # securePassword123 (base64 encoded)
database: cHJvZHVjdGlvbl9kYg== # production_db (base64 encoded)
The above CronJob creates a backup of the database using pg_dump
and puts it in a temporary location.
Apply them so we can see the backup in action.
sergiu@epsilon:~/ocp/upgrade$ oc get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
postgres-backup */1 * * * * False 0 35s 39m
The issue with the above CronJob is that although it creates a backup file, it doesn't add it to any kind of persistent storage.
Create an emptyDir volume mount, mount it to the /backup
path and change the backup script so that it copies the backup files to the backup volume.
Change the backup schedule so that it only does a backup every hour.
Change the policy so that it can only run one backup job in parallel. Look into the documentation so that you will not allow concurrent jobs: https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/.