MLOps

Akash Saini ( Akki )

Published May 26, 2020

I completed one of the tasks given to me by Sir Vimal Daga, a world record holder, during my industrial training under LinuxWorld India Pvt. Ltd.

The task is all about automating the process of training and tuning an ML or DL model till the point where we get a desired accuracy. It is achieved by integrating the concepts of Deep Learning with the operational works of DevOps.

The work starts by creating two docker files, one with preinstalled libraries required for Machine Learning Models and other for Deep Learning Models for respective works. I created below mentioned Dockerfiles to create those images.

Now the main task begins in which I created 5 jenkins jobs via Build Pipeline as mentioned below.

Job1: The work of job one is just to pull the code and dataset from the Github as soon as the developer pushes them to Github and copy the code and dataset to our training environment.

Job2: It will first check if the "project.py" file pushed by developer contains a Machine Learning code or a Deep Learning code by analyzing the libraries used in code. Then this job will launch a Docker image of respective need and as soon as the image is launched, the training process of the model starts.

Job3: This job copies the trained model(model.h5) and the accuracy(accuracy.txt) from the Docker image to the base OS of training environment for further process.

Job4: Here comes the interesting part. I have created tweek.py file which is ment to do some changes in the main code file, i.e., project.txt. As soon as job4 runs, the tweek.py file runs in python terminal which first checks if the accuracy we got is less than the required accuracy (in this case I have set 90%). And if yes, then it adds one more Convolution and Maxpool layer in the code and gives a failure response which triggers Job2 for starting the training process again. We can do many things like altering the number of nurons and epochs but here I am just adding Convolution and Maxpool layer in each iteration.

Job5: If Job4 is successful then this job is triggered and it just copies the model from jenkins workspace to root directory and notifies that the model is trained.

I have created one more additional job, Job6, which keeps on monitoring the Job2 where the main training work goes on. In case, if the training Docker image goes down or Job2 throws some failure message, then Job6 triggers Job2 again.

I used Mnist dataset for this work so in just 2 convolution layers and 1 epoch I got around 92% accuracy. I have uploaded the initial code that I used as well as tweek.py file both to Github.

Github URL: https://github.com/akash335saini/DLwork.git

A Final View:

To view or add a comment, sign in

MLOps

Akash Saini ( Akki )

More articles by Akash Saini ( Akki )

Others also viewed

Devops meets machine learning for more fun!

TASK 3- Machine Learning Integration With DevOps (to select best Hyperparameter for dataset)

MLOps Project

MLOps: Integration of ML with DevOps

Automatic Train Machine Learning Model With Jenkins Pipeline: MLOps

Learning to code by generating scripts using LLMs (chatGPT and Claude) - part two

Automated Model Training Project

MLOps: Integrating ML with Devops works better.

ML + DEV + OPS = MLOPS

Machine learning integration with DevOps

Tips for Continuous Improvement in DevOps Practices

How to Build Reliable LLM Systems for Production

Key Steps in Implementing MLOps

How to Manage the ML Lifecycle

How to Scale AI Beyond Pilot Projects

Explore content categories

More articles by Akash Saini ( Akki )

Expert Session on Red Hat OpenShift

A use case study of how Netflix uses services provided by AWS.

__Getting started with Big Data__

A Self Reflection of mine

Instance Segmentation Using Mask R CNN

Task 2

Do not remain just sitting, Fight against Corona....