Introduction to MLOps

Introduction to MLOps

MLOps is a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently.

Data science and ML are becoming core capabilities for solving complex real-world problems, transforming industries, and delivering value in all domains. Currently, the ingredients for applying effective ML are available to you:

  • Large datasets
  • Inexpensive on-demand compute resources
  • Specialized accelerators for ML on various cloud platforms
  • Rapid advances in different ML research fields (such as computer vision, natural language understanding, and recommendations AI systems).

Therefore, many businesses are investing in their data science teams and ML capabilities to develop predictive models that can deliver business value to their users.

Here are a few techniques and tools that can be useful to deliver a reliable, valuable, and well-structured code or a project.

MLOps: Continuous delivery and automation pipelines in machine learning

This document from Google Cloud discusses techniques for implementing and automating continuous integration (CI), continuous delivery (CD), and continuous training (CT) for machine learning (ML) systems. In detail the document contents of

- DevOps vs MLOps

- Data Science steps for ML

- MLOps level 0: Manual process

  • The process for building and deploying ML models is entirely manual.

- MLOps level 1: ML pipeline automation

  • It performs continuous training of the model by automating the ML pipeline; which gives continuous delivery of model prediction service. It automates the pipeline for model validation and model retraining in production with new data.

- MLOps level 2: CI/CD pipeline automation

  • For a rapid and reliable update of the pipelines in production, you need an automated CI/CD system.
  • This automated CI/CD system lets your data scientists rapidly explore new ideas around feature engineering, model architecture, and hyperparameters.
  • They can implement these ideas and automatically build, test, and deploy the new pipeline components to the target environment


DVC

Data Version Control, or DVC, is a data and ML experiment management tool that takes advantage of the existing toolset likes Git, CI/CD, etc.

This is a critical challenge: while ML algorithms and methods are difficult to implement, reuse, and manage.

Basic uses,

If you store and process data files or datasets to produce other data or machine learning models, and you want to

  • track and save data and machine learning models the same way you capture code;
  • create and switch between versions of data and ML models easily;
  • understand how datasets and ML artifacts were built in the first place;
  • compare model metrics among experiments;
  • adopt engineering tools and best practices in data science projects;

To know more about DVC click here


Cookiecutter Data Science

Cookiecutter Data Science builds a logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

When we think about data analysis, we often think just about the resulting reports, insights, or visualizations the end products are generally the main event, it's easy to focus on making the products look nice and ignore the quality of the code that generates them. Because these end products are created programmatically, code quality is still important

Tentative experiments and rapidly testing approaches that might not work out are all part of the process for getting to the good stuff

So, it's best to start with a clean, logical structure of your code or project and stick to it throughout.

Cookiecutter Data Science helps us with that. To know more

MLflow

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle.

  • Tracking experiments to record and compare parameters and results (MLflow Tracking)
  • Packaging ML code in a reusable, reproducible form in order to share or transfer to production (MLflow Projects).
  • Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms (MLflow Models).
  • Providing a central model store to collaboratively manage the full lifecycle of an MLflow Model, including model versioning, stage transitions, and annotations (MLflow Model Registry).

You can use it with any machine learning library, and in any programming language, since all functions are accessible. To know more click here





To view or add a comment, sign in

Others also viewed

Explore content categories