MLOps - Production Models

James Dobson

Published Nov 8, 2022

Machine learning is an amazing tool for tackling some of the biggest problems we now face, but how do you release and maintain models that are always changing and getting bigger (or even running in different 'run-modes').

A 'typical' machine learning pipeline will have by it's nature an iterative cycle with many possible release candidates, upgrades and production models. The following is what I consider the cycle for a model.

This lifecycle can have overlap and it can depend on the model build/test/promote cycle (some keep running and create 'snapshots' as releases).

Ingesting Data

Ingesting can depend on the data but generally, most go through some level of validation/labelling/bias correction and anonymization. Generally, the raw data is collected and then either labelled or put into a different storage for actual training.

Build Model

When the model is building this can either at the end of an iteration or as part of a 'checkpoint' produce a release candidate model, this should be placed in the release candidate storage and the version history used to determine what increment it should be, the example above is semantic version, obviously, there are other ways (datetime, sha, uuid4).

Test Model

The testing model stage can be built into the training, but it can also be a separate external validation of the model, in any case, it should if successfully promote the model into validation, this is a gatekeeping step that a model can be deleted or expire (depending on your storage).

Recommended by LinkedIn

AutoML (Automated Machine Learning)

Bluechip Technologies Asia 1 year ago

AutoML

Revathi P 1 year ago

Unlocking the Power of Feature Engineering in Machine…

Karel Becerra 1 year ago

Validate Model

The validation stage could be combined with testing but this is specifically less about accuracy or model improvements but more about the performance and size of the model. Ideally, a threshold of performance/size can be set so that if a model goes over then it's never promoted to production and will expire/be deleted. If the model is successful the version information should be marked in some way to denote a new production model and a copy placed into the production model storage. Ideally a dashboard or something to report back on the model performance should give feedback on any sudden changes or changes over time (which is important for production service level agreements etc).

Release Model

This stage could just be the ML Apps release process but if we are using containers to ship the model data then they can be built here and released to production. The release process could include a staging release strategy or a blue/green style deployment but it's at this stage that any release management on the system(s) will take effect.

Model Performance

The performance of the model in the production systems should be constantly evaluated from a host/application metric point of view. This can help with improvements in the model engine and model itself and as the application gets more mature you can start checking for edge cases or regressions in performance at scale. Feedback is a general catch for any customer/developer feedback on the production system that can be gathered in any number of ways.

Example Deployments

The deploy.sh script in both instances checks for the existing of the model in the shared volume and if not found copies the content of the container that contains the model into it.

Kubernetes

metadata
  name: simple-ML-app
  labels:
    app: mlapp
spec:
  containers:
    - name: ml-application
      image: mlapp
      volumeMounts:
        - name: model
          mountPath: /model/
  initContainers:
    - name: model-container
	  image: model:1.43.13
	  command: ['sh','-c','/deploy.sh']
      volumeMounts:
	    - name: model
		  mountPath: /model/
 volumes:
    - name: model
      emptyDir: {}:

Nomad

job "mlapp1" 
  datacenters = ["dc1"]
  type        = "service"


  group "mlapp" {
    volume "models" {
      type      = "host"
      source    = "models"
      read_only = false
    }
    task "ml-bootstrap" {
      driver = "docker"
	
	  config {
	    image = "model:1.43.13"
		command = "sh"
		args = ["-c", "/deploy.sh"]
	  }
	  resources {
        cpu    = 200
        memory = 128
      }
	  volume_mount {
		volume = "models"
		destination = "/model"
	  }
	}
	
    task "ml-service" {
      driver = "docker"


      config {
        image   = "mlapp"
        command = "sh"
        args    = ["-c", "echo The service is running! && while true; do sleep 2; done"]
      }


      resources {
        cpu    = 200
        memory = 128
      }


      service {
        name = "mlapp1"
      }


	  volume_mount {
		volume = "models"
		destination = "/model"
	  }


    }
  }
}

Hopefully this can give you some ideas on how a system for machine learning models can operate at scale without large human resource overheads, obviously these steps could be part of a CI/CD or part of ML pipeline or both.

To view or add a comment, sign in

MLOps - Production Models

James Dobson

Ingesting Data

Build Model

Test Model

Recommended by LinkedIn

Validate Model

Release Model

Model Performance

Example Deployments

More articles by James Dobson

Others also viewed

🚀🤖 Unlocking the Power of AutoML: Streamlining Machine Learning Workflows 🌐💡

Building a Machine Learning Pipeline – Deployment

Comprehensive Guide to MLflow: Managing the Machine Learning Lifecycle

AutoML (Automated Machine Learning) with Use-Cases

Navigating the Machine Learning Development Life Cycle: A Comprehensive Guide

Top Machine Learning Development Companies in 2026

Supervised Machine Learning Ensemble Techniques

How to Quickly Find Important Features for a Machine Learning Model

Machine Learning Simplified - An Overview for Business Users

How to Manage the ML Lifecycle

How to Build Reliable LLM Systems for Production

Key Steps in Implementing MLOps

How to Optimize Machine Learning Performance

Machine Learning Model Development

Machine Learning Deployment Approaches

Explore content categories

Ingesting Data

Build Model

Test Model

Recommended by LinkedIn

Validate Model

Release Model

Model Performance

Example Deployments

More articles by James Dobson

Micro Pipelines - Micro Service Architecture for Pipelines

Hashicorp Nomad and Pipelines

Dev Ops and Machine Learning

What are the key 'Dev Ops' skills

Monitoring Hosts via Slack

Media Services In the Cloud

Others also viewed

🚀🤖 Unlocking the Power of AutoML: Streamlining Machine Learning Workflows 🌐💡

Building a Machine Learning Pipeline – Deployment

Comprehensive Guide to MLflow: Managing the Machine Learning Lifecycle

AutoML (Automated Machine Learning) with Use-Cases

Navigating the Machine Learning Development Life Cycle: A Comprehensive Guide

Top Machine Learning Development Companies in 2026

Supervised Machine Learning Ensemble Techniques

How to Quickly Find Important Features for a Machine Learning Model

Machine Learning Simplified - An Overview for Business Users

Similar topics

How to Manage the ML Lifecycle

How to Build Reliable LLM Systems for Production

Key Steps in Implementing MLOps

How to Optimize Machine Learning Performance

Machine Learning Model Development

Machine Learning Deployment Approaches

Explore content categories