[Part 4] Simplifying DataOps - Persistence in our container world

Steve Guhr

Published Aug 6, 2019

Last time we were installing Kubernetes on a DGX system. But this is only half the rent because we did not deploy any application on it, right? So, the next step is to install something cool on top of Kubernetes to do even cooler things.

Also mentioned before were two solutions that need orchestration to get them up and running: JupyterHub and Kubeflow. Since we want to play with AI pipelines and general machine learning topics we decided to start with Kubeflow (https://www.kubeflow.org/). It is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable.

There is still one thing missing – Kubeflow is working with different kinds of data: there is RAW data for preprocessing, then a model gets created through training and must be deployed to the inferencing server. One of the biggest challenges using containers is the persistence of data – when a container gets lost or deleted, all data inside is lost. At this point you should start thinking about a solution to store data in the backend.

Since a few years NetApp is offering an open-source storage orchestrator for Kubernetes called “trident” (https://github.com/NetApp/trident). With this software you can create storage classes inside the Kubernetes cluster which then can be used to create PVs (persistent volumes). Those PVs get claimed within deployments to be added to a container instance - from this point data written to the PV is persistent beyond the lifecycle of the container.

Long story short: how do we get trident installed? Glad you asked!

(there is a full tutorial on https://netapp-trident.readthedocs.io/en/stable-v19.04/ - we will only cover the most important steps to keep the article short)

First, we download the binaries from the github repository and unzip them on the Kubernetes master node:

wget https://github.com/NetApp/trident/releases/download/v19.04.1/trident-installer-19.04.1.tar.gz

tar -xf trident-installer-19.04.1.tar.gz

cd trident-installer

Trident is installed to its own namespace so we have to create one before we can continue:

dgxadmin@cbc-dgx1ai01:~$ kubectl create namespace trident

Afterwards we store our configuration file to connect to our NetApp system.

dgxadmin@cbc-dgx1ai01:~/trident-install/setup$ cat backend.json

 

{

    "version": 1,

    "storageDriverName": "ontap-nas",

    "managementLIF": "ip_of_the_tenant_mgmt_interface",

    "dataLIF": "ip_of_the_tenant_data_interface",

    "svm": "name_of_the_tenant",

    "username": "some_user",

    "password": "some_password",

    "defaults": {

      "spaceReserve": "none",

      "exportPolicy": "default"

    }

}

Next thing is to do a try-run and check if everything is fine:

dgxadmin@cbc-dgx1ai01:~/trident-install$ ./tridentctl install --dry-run -n trident

No errors? Great! We can continue our work and do the actual installation without the “dry-run”. After successful installation there are trident pods existing in the given namespace:

dgxadmin@cbc-dgx1ai01:~/trident-install$ ./tridentctl install -n trident

dgxadmin@cbc-dgx1ai01:~$ kubectl get pods -n trident

NAME                      READY  STATUS   RESTARTS  AGE
trident-5676b84b5c-mxrp4  2/2    Running  0         2min

Next step is to create “backends” – those backends are used to attach different storage systems. We create one backend to provision storage using the same file we mentioned for installation:

dgxadmin@cbc-dgx1ai01:~/trident-install/setup$ ./tridentctl -n trident create backend -f setup/backend.json

+------------------------+----------------+--------+---------+

|          NAME          | STORAGE DRIVER | ONLINE | VOLUMES |

+------------------------+----------------+--------+---------+

| ontapnas_XXX.XXX.XX.XX | ontap-nas      | true   |       0 |

+------------------------+----------------+--------+---------+

With backends configured we still cannot create persistent volumes due to the lack of storage classes which are the “qualities” of storage (SSD, NL-SAS etc.). Different parameters can be set for every and each class to meet your SLAs – e.g. one common concept is to have a “gold”, “silver” and “bronze” class to represent different performance levels and/or availability.

dgxadmin@cbc-dgx1ai01:~/trident-install/ cat setup/stg-cls-gold.yaml

 

apiVersion: storage.k8s.io/v1beta1

kind: StorageClass

metadata:

 name: gold

provisioner: netapp.io/trident

parameters:

 media: ssd

 provisioningType: thin



dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl apply -f setup/stg-cls-gold.yaml

dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl get storageclasses


NAME      PROVISIONER         AGE
gold      netapp.io/trident   28m

NOW we are good to go with our very first container claiming persistent storage! Feels good, doesn’t it? we will deploy one small web application with a small persistent volume.

The persistent volume claim (PVC) gets created first:

dgxadmin@cbc-dgx1ai01:~/trident-install/ cat setup/pvc_test.yaml

 
kind: PersistentVolumeClaim

apiVersion: v1

metadata:

  name: anynameforyourpvc

spec:

  accessModes:

    - ReadWriteOnce

  resources:

    requests:

      storage: 10Gi

  storageClassName: gold

 

dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl apply -f setup/pvc_test.yaml

dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl get pvc -aw

NAME                STATUS    VOLUME                            CAPACITY   ACCESS MODES   STORAGECLASS   AGE
anynameforyourpvc   Bound     default-anynameforyourpvc-24c5a   10Gi       RWO            gold           28s

Afterwards a deployment is created with a container using the claim as mounted storage:

dgxadmin@cbc-dgx1ai01:~/trident-install/ cat setup/pod_test.yaml

 

kind: Pod

apiVersion: v1

metadata:

  name: task-pv-pod

spec:

  volumes:

    - name: task-pv-storage

      persistentVolumeClaim:

       claimName: anynameforyourpvc

  containers:

    - name: task-pv-container

      image: nginx

      ports:

        - containerPort: 80

          name: "http-server"

      volumeMounts:

        - mountPath: "/usr/share/nginx/html"

          name: task-pv-storage

 


dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl apply -f setup/pod_test.yaml

dgxadmin@cbc-dgx1ai01:~/trident-install/ kubectl exec -it task-pv-pod -- df -h /usr/share/nginx/html


Filesystem                                              Size  Used Avail Use% Mounted on
xxx.xxx.xx.xx:/trident_default_anynameforyourpvc_24c5a   10G  128K   10G   1% /usr/share/nginx/html

Congratulations! There it is - volume #1 inside a container infrastructure - well done!

What does this mean?

We created persistent storage for a deployment within Kubernetes – and we did this by using Kubernetes only. There was no need to know anything about the storage in the background or some other infrastructure related parts. It is possible to have multiple storage classes to build different performance and availability levels (Gold, Silver, Bronze).

Kubeflow on top makes it even more easy: it integrates solutions like Jupyter notebooks which automatically get created on demand including the persistent storage for later use. For other applications in the stack (e.g. the Pipelines) you can use persistent volumes inside your CRDs to get storage provisioned when necessary. We do not copy data anymore (unless we want to 😉) …

--- References ---

Blog [Part 3] can be found here: https://www.garudax.id/pulse/part-3-simplifying-dataops-kubernetes-nvidia-dgx-worth-steve-guhr/

Blog [Part 2] can be found here: https://www.garudax.id/pulse/part-2-simplifying-dataops-datascience-service-jupyter-steve-guhr

Blog [Part 1] can be found here: https://www.garudax.id/pulse/part-1-simplifying-dataops-my-wedding-meets-data-science-steve-guhr/

There is also a blog with strong Data Science background from my colleague Muneer: https://www.garudax.id/pulse/simplify-machine-learning-version-control-muneer-ahmad-dedmari/

To view or add a comment, sign in

[Part 4] Simplifying DataOps - Persistence in our container world

Steve Guhr

More articles by Steve Guhr

Others also viewed

Databricks + Tecton: Why This Acquisition Signals a New Era for Real‑Time ML

Unlocking the Power of Machine Learning with Databricks on AWS

What Makes Xcalar’s Products Unique?

MPP (Massive Parallel Processing)

AI Can Write Code. But Can It Run Your Data Platform?

Apache Spark Community to Data + AI Summit | From Spark to GenAI, Agents and Beyond

Parallel execution in Spark

Databricks Just Upgraded Your Free Experience

Build your own AIOPS using open source tools

How to Deploy Data Systems with Kubernetes

Simplifying Backstage Deployment on Kubernetes

Managing Kubernetes Lifecycle for Stable Cloud Operations

How to Streamline Kubernetes Cluster Setup

Kubernetes Deployment Tactics

Kubernetes Deployment Strategies on Google Cloud

Best Practices for Deploying Apps and Databases on Kubernetes

Explore content categories

More articles by Steve Guhr

„Ich bin stolz auf dich" – Warum sich das seltsam (und heilsam) anfühlt

Why “Constructive” Feedback Might Be the Wrong Lens

[Part 5] Simplifying DataOps - DataScience as a Service with Kubeflow

[Part 3] Simplifying DataOps - kubernetes on an NVIDIA DGX? Worth a shot ...

[Part 2] Simplifying DataOps - Datascience as a Service with jupyter, kubernetes and trident

[Part 1] Simplifying DataOps - My Wedding meets Data Science … wait what?

NetApp in a DevOps World?! Roboter, Container und eine Baby-Katze …

NetApp in a DevOps World?? Es geht doch um Kubernetes und nicht um Metrocluster?! Guess what ...?!