MLOps - Production Models
Machine learning is an amazing tool for tackling some of the biggest problems we now face, but how do you release and maintain models that are always changing and getting bigger (or even running in different 'run-modes').
A 'typical' machine learning pipeline will have by it's nature an iterative cycle with many possible release candidates, upgrades and production models. The following is what I consider the cycle for a model.
This lifecycle can have overlap and it can depend on the model build/test/promote cycle (some keep running and create 'snapshots' as releases).
Ingesting Data
Ingesting can depend on the data but generally, most go through some level of validation/labelling/bias correction and anonymization. Generally, the raw data is collected and then either labelled or put into a different storage for actual training.
Build Model
When the model is building this can either at the end of an iteration or as part of a 'checkpoint' produce a release candidate model, this should be placed in the release candidate storage and the version history used to determine what increment it should be, the example above is semantic version, obviously, there are other ways (datetime, sha, uuid4).
Test Model
The testing model stage can be built into the training, but it can also be a separate external validation of the model, in any case, it should if successfully promote the model into validation, this is a gatekeeping step that a model can be deleted or expire (depending on your storage).
Recommended by LinkedIn
Validate Model
The validation stage could be combined with testing but this is specifically less about accuracy or model improvements but more about the performance and size of the model. Ideally, a threshold of performance/size can be set so that if a model goes over then it's never promoted to production and will expire/be deleted. If the model is successful the version information should be marked in some way to denote a new production model and a copy placed into the production model storage. Ideally a dashboard or something to report back on the model performance should give feedback on any sudden changes or changes over time (which is important for production service level agreements etc).
Release Model
This stage could just be the ML Apps release process but if we are using containers to ship the model data then they can be built here and released to production. The release process could include a staging release strategy or a blue/green style deployment but it's at this stage that any release management on the system(s) will take effect.
Model Performance
The performance of the model in the production systems should be constantly evaluated from a host/application metric point of view. This can help with improvements in the model engine and model itself and as the application gets more mature you can start checking for edge cases or regressions in performance at scale. Feedback is a general catch for any customer/developer feedback on the production system that can be gathered in any number of ways.
Example Deployments
The deploy.sh script in both instances checks for the existing of the model in the shared volume and if not found copies the content of the container that contains the model into it.
Kubernetes
metadata
name: simple-ML-app
labels:
app: mlapp
spec:
containers:
- name: ml-application
image: mlapp
volumeMounts:
- name: model
mountPath: /model/
initContainers:
- name: model-container
image: model:1.43.13
command: ['sh','-c','/deploy.sh']
volumeMounts:
- name: model
mountPath: /model/
volumes:
- name: model
emptyDir: {}:
Nomad
job "mlapp1"
datacenters = ["dc1"]
type = "service"
group "mlapp" {
volume "models" {
type = "host"
source = "models"
read_only = false
}
task "ml-bootstrap" {
driver = "docker"
config {
image = "model:1.43.13"
command = "sh"
args = ["-c", "/deploy.sh"]
}
resources {
cpu = 200
memory = 128
}
volume_mount {
volume = "models"
destination = "/model"
}
}
task "ml-service" {
driver = "docker"
config {
image = "mlapp"
command = "sh"
args = ["-c", "echo The service is running! && while true; do sleep 2; done"]
}
resources {
cpu = 200
memory = 128
}
service {
name = "mlapp1"
}
volume_mount {
volume = "models"
destination = "/model"
}
}
}
}
Hopefully this can give you some ideas on how a system for machine learning models can operate at scale without large human resource overheads, obviously these steps could be part of a CI/CD or part of ML pipeline or both.