MLOps: Repeatability in Production ML Systems

Vikrant .

Published Apr 11, 2026

+ Follow

Why "It Worked Yesterday" Is Not a Valid State

Article content — Same inputs. Same outputs. Every time.

Most machine learning systems perform well in controlled environments.

They train successfully. They pass evaluation metrics. They produce expected outputs on test data.

Then they are deployed.

And something subtle changes.

Not dramatically. Not immediately. But enough to introduce inconsistency.

The same pipeline, with the same intent, begins to produce different results.

At first, this is dismissed as noise. Later, it becomes a pattern. Eventually, it becomes a problem no one can precisely explain and no one can reliably reproduce to fix.

What Repeatability Actually Means

Repeatability is often reduced to reproducibility in experiments.

In production systems, it is stricter.

Given the same inputs, the system should produce the same outputs consistently across time and environments.

This requirement extends beyond the model. It includes data ingestion, feature generation, model inference, configuration, and the execution environment itself.

If any of these components are not controlled, repeatability breaks. It usually breaks quietly.

What Happens When Repeatability Is Missing

The absence of repeatability rarely appears as a single failure. It emerges through patterns, and teams that have lived through these recognize them immediately.

• Same pipeline, different outputs

A pipeline is executed with identical intent on two different days. The outputs are not the same.

There is no clear record of what changed. No versioned data snapshot. No traceable configuration difference. What appears stable at the surface is not reproducible underneath.

Common underlying causes include the following:

Unversioned data sources
Silent schema evolution where an upstream table quietly gains or loses a column
Non-deterministic preprocessing where shuffle order or sampling is not fixed

• Works locally, behaves differently in production

A model behaves as expected in a local environment. When deployed, the same pipeline produces different results. The difference is not in the model itself, but in the surrounding system.

Difference often come from:

A library version that differs by a minor release
A runtime configuration that was never formally set
A feature store returning slightly different values depending on when it is called

The system is technically functional, but operationally inconsistent.

• Debugging becomes reconstruction

An issue is reported in production. The first question should be simple. What changed.

Without repeatability, this question is difficult to answer. Teams begin reconstructing past runs. Which data was used. Which model version was loaded. Which configuration was active.

Diagnosis becomes archaeology. Hours are spent recovering context that should never have been lost.

• Hidden dependencies solidify into invisible infrastructure

In the absence of structured pipelines, systems begin to rely on implicit behavior:

Manual reruns
One-off fixes
Scripts that only run on one machine because someone set it up that way once

The system continues to function, but only under conditions that are neither visible nor controlled.

Why This Is Not a Modeling Problem

When outputs become inconsistent, the instinct is to revisit the model.

Teams try hyperparameter tuning
Model replacement
Architecture changes

In many cases, the model is not the source of the issue. The system around it is.

Repeatability is a systems property. Without it, even a well-performing model will behave unpredictably.

You cannot tune your way out of an infrastructure problem.

Designing for Repeatability

Repeatability does not emerge by default. It must be engineered deliberately. Teams that skip this step eventually find themselves doing reconstruction work under pressure.

Version everything that affects output. This includes:

Raw data
Processed features
Model artifacts
Configuration files

If it influences output, it must be versioned.

An S3 path without a version identifier is not a data source. It is a variable.

Enforce deterministic pipelines.

Fix random seeds.
Sort where ordering is not guaranteed
Control sampling explicitly

Determinism is a requirement, not an optimization.

Standardize execution environments.

Containerize
Lock dependencies
Maintain parity between development and production

A model that works locally should not behave differently in production.

Eliminate implicit state.

No untracked cached results
No manual interventions that are not recorded
Every dependency should be explicit

Make every run traceable.

Every execution should answer what data was used, what model version was applied, what configuration was active, and what output was produced. Without traceability, repeatability cannot be verified.

A Simple Test

Can you rerun a pipeline from six months ago and obtain the same result?

If the answer depends on conditions, assumptions, or reconstruction, repeatability is not established.

Closing Thought

Production ML systems do not fail because they are inaccurate.

They fail because they are inconsistent.

Accuracy can be measured. Inconsistency cannot be controlled without deliberate system design.

Repeatability is the first and most foundational layer of a reliable ML system. Without it, automation becomes unpredictable, observability loses its signal, and reliability cannot be sustained.

Every improvement layered on top of an unrepeatable system rests on unstable ground.

This is Part 1 of the RAOR series: Repeatability, Automation, Observability, Reliability.

Part 2 covers Automation. How to move from manually operated pipelines to systems that run consistently without intervention.

If your team has hit any of these patterns in production, I would be interested to hear how you approached it. Drop a comment below.

Vikrant . 3w

What has been the hardest part of maintaining consistency in production ML systems? Data versioning, environment drift, or something else?

To view or add a comment, sign in

MLOps: Repeatability in Production ML Systems

Vikrant .

Why "It Worked Yesterday" Is Not a Valid State

What Repeatability Actually Means

What Happens When Repeatability Is Missing

• Same pipeline, different outputs

• Works locally, behaves differently in production

• Debugging becomes reconstruction

• Hidden dependencies solidify into invisible infrastructure

Recommended by LinkedIn

Why This Is Not a Modeling Problem

Designing for Repeatability

A Simple Test

Closing Thought

More articles by Vikrant .

Others also viewed

A new term for modern engineers: the Generative Blind Spot. Here’s how I ran into it in the real world.

Making Machine Learning Accessible for Test Engineers

How Feature Engineering Unlocks ML Performance

Why Most ML Models Never Make It to Production (And How We Fix That)

Machine Learning System Design: What It Is, Why It Matters & How to Frame Problems the Right Way

Understanding MLOps Through a Practical Image Classification Walkthrough (Series 1/2)

Unveiling the Potential of Support Vector Machines in Feature Engineering

Feature Engineering: Art or Science?

AutoML (Automated Machine Learning)

The Model is Only the Tip of the Iceberg: 5 Hard Truths About Bringing AI into Production

Explore content categories

Why "It Worked Yesterday" Is Not a Valid State

What Repeatability Actually Means

What Happens When Repeatability Is Missing

• Same pipeline, different outputs

• Works locally, behaves differently in production

• Debugging becomes reconstruction

• Hidden dependencies solidify into invisible infrastructure

Recommended by LinkedIn

Why This Is Not a Modeling Problem

Designing for Repeatability

A Simple Test

Closing Thought

More articles by Vikrant .

The Architecture of Autonomous Agents: What Intelligence Operations Can Teach Us About Agentic AI

MLOps: Automation in Production ML and LLM Systems

MLOps Series Announcement

Others also viewed

A new term for modern engineers: the Generative Blind Spot. Here’s how I ran into it in the real world.

Making Machine Learning Accessible for Test Engineers

How Feature Engineering Unlocks ML Performance

Why Most ML Models Never Make It to Production (And How We Fix That)

Machine Learning System Design: What It Is, Why It Matters & How to Frame Problems the Right Way

Understanding MLOps Through a Practical Image Classification Walkthrough (Series 1/2)

Unveiling the Potential of Support Vector Machines in Feature Engineering

Feature Engineering: Art or Science?

AutoML (Automated Machine Learning)

The Model is Only the Tip of the Iceberg: 5 Hard Truths About Bringing AI into Production

Similar topics

How to Build Reliable LLM Systems for Production

Explore content categories