[Part 4] Simplifying DataOps - Persistence in our container world
Last time we were installing Kubernetes on a DGX system. But this is only half the rent because we did not deploy any application on it, right? So, the next step is to install something cool on top of Kubernetes to do even cooler things.
Also mentioned before were two solutions that need orchestration to get them up and running: JupyterHub and Kubeflow. Since we want to play with AI pipelines and general machine learning topics we decided to start with Kubeflow (https://www.kubeflow.org/). It is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.
There is still one thing missing – Kubeflow is working with different kinds of data: there is RAW data for preprocessing, then a model gets created through training and must be deployed to the inferencing server. One of the biggest challenges using containers is the persistence of data – when a container gets lost or deleted, all data inside is lost. At this point you should start thinking about a solution to store data in the backend.
Since a few years NetApp is offering an open-source storage orchestrator for Kubernetes called “trident” (https://github.com/NetApp/trident). With this software you can create storage classes inside the Kubernetes cluster which then can be used to create PVs (persistent volumes). Those PVs get claimed within deployments to be added to a container instance - from this point data written to the PV is persistent beyond the lifecycle of the container.
Long story short: how do we get trident installed? Glad you asked!
(there is a full tutorial on https://netapp-trident.readthedocs.io/en/stable-v19.04/ - we will only cover the most important steps to keep the article short)
First, we download the binaries from the github repository and unzip them on the Kubernetes master node:
wget https://github.com/NetApp/trident/releases/download/v19.04.1/trident-installer-19.04.1.tar.gz tar -xf trident-installer-19.04.1.tar.gz cd trident-installer
Trident is installed to its own namespace so we have to create one before we can continue:
dgxadmin@cbc-dgx1ai01:~$ kubectl create namespace trident
Afterwards we store our configuration file to connect to our NetApp system.
dgxadmin@cbc-dgx1ai01:~/trident-install/setup$ cat backend.json
{
"version": 1,
"storageDriverName": "ontap-nas",
"managementLIF": "ip_of_the_tenant_mgmt_interface",
"dataLIF": "ip_of_the_tenant_data_interface",
"svm": "name_of_the_tenant",
"username": "some_user",
"password": "some_password",
"defaults": {
"spaceReserve": "none",
"exportPolicy": "default"
}
}
Next thing is to do a try-run and check if everything is fine:
dgxadmin@cbc-dgx1ai01:~/trident-install$ ./tridentctl install --dry-run -n trident
No errors? Great! We can continue our work and do the actual installation without the “dry-run”. After successful installation there are trident pods existing in the given namespace:
dgxadmin@cbc-dgx1ai01:~/trident-install$ ./tridentctl install -n trident dgxadmin@cbc-dgx1ai01:~$ kubectl get pods -n trident NAME READY STATUS RESTARTS AGE trident-5676b84b5c-mxrp4 2/2 Running 0 2min
Next step is to create “backends” – those backends are used to attach different storage systems. We create one backend to provision storage using the same file we mentioned for installation:
dgxadmin@cbc-dgx1ai01:~/trident-install/setup$ ./tridentctl -n trident create backend -f setup/backend.json +------------------------+----------------+--------+---------+ | NAME | STORAGE DRIVER | ONLINE | VOLUMES | +------------------------+----------------+--------+---------+ | ontapnas_XXX.XXX.XX.XX | ontap-nas | true | 0 | +------------------------+----------------+--------+---------+
With backends configured we still cannot create persistent volumes due to the lack of storage classes which are the “qualities” of storage (SSD, NL-SAS etc.). Different parameters can be set for every and each class to meet your SLAs – e.g. one common concept is to have a “gold”, “silver” and “bronze” class to represent different performance levels and/or availability.
dgxadmin@cbc-dgx1ai01:~/trident-install/ cat setup/stg-cls-gold.yaml apiVersion: storage.k8s.io/v1beta1 kind: StorageClass metadata: name: gold provisioner: netapp.io/trident parameters: media: ssd provisioningType: thin dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl apply -f setup/stg-cls-gold.yaml dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl get storageclasses NAME PROVISIONER AGE gold netapp.io/trident 28m
NOW we are good to go with our very first container claiming persistent storage! Feels good, doesn’t it? we will deploy one small web application with a small persistent volume.
The persistent volume claim (PVC) gets created first:
dgxadmin@cbc-dgx1ai01:~/trident-install/ cat setup/pvc_test.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: anynameforyourpvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: gold dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl apply -f setup/pvc_test.yaml dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl get pvc -aw NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE anynameforyourpvc Bound default-anynameforyourpvc-24c5a 10Gi RWO gold 28s
Afterwards a deployment is created with a container using the claim as mounted storage:
dgxadmin@cbc-dgx1ai01:~/trident-install/ cat setup/pod_test.yaml kind: Pod apiVersion: v1 metadata: name: task-pv-pod spec: volumes: - name: task-pv-storage persistentVolumeClaim: claimName: anynameforyourpvc containers: - name: task-pv-container image: nginx ports: - containerPort: 80 name: "http-server" volumeMounts: - mountPath: "/usr/share/nginx/html" name: task-pv-storage dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl apply -f setup/pod_test.yaml dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl exec -it task-pv-pod -- df -h /usr/share/nginx/html Filesystem Size Used Avail Use% Mounted on xxx.xxx.xx.xx:/trident_default_anynameforyourpvc_24c5a 10G 128K 10G 1% /usr/share/nginx/html
Congratulations! There it is - volume #1 inside a container infrastructure - well done!
What does this mean?
We created persistent storage for a deployment within Kubernetes – and we did this by using Kubernetes only. There was no need to know anything about the storage in the background or some other infrastructure related parts. It is possible to have multiple storage classes to build different performance and availability levels (Gold, Silver, Bronze).
Kubeflow on top makes it even more easy: it integrates solutions like Jupyter notebooks which automatically get created on demand including the persistent storage for later use. For other applications in the stack (e.g. the Pipelines) you can use persistent volumes inside your CRDs to get storage provisioned when necessary. We do not copy data anymore (unless we want to 😉) …
--- References ---
Blog [Part 3] can be found here: https://www.garudax.id/pulse/part-3-simplifying-dataops-kubernetes-nvidia-dgx-worth-steve-guhr/
Blog [Part 2] can be found here: https://www.garudax.id/pulse/part-2-simplifying-dataops-datascience-service-jupyter-steve-guhr
Blog [Part 1] can be found here: https://www.garudax.id/pulse/part-1-simplifying-dataops-my-wedding-meets-data-science-steve-guhr/
There is also a blog with strong Data Science background from my colleague Muneer: https://www.garudax.id/pulse/simplify-machine-learning-version-control-muneer-ahmad-dedmari/