How to detect, evaluate and visualize historical drifts in the data

Emeli Dral

Published Aug 10, 2021

We often talk about detecting drift on live data.

The goal is then to check if the current distributions deviate from training or some past period. When drift is detected, you know that your model operates in a relatively unknown space. It's time to do something about it.

But there are a lot of nuances here. How big of drift is DRIFT? Should I care if only 10% of my features drifted? Should I look at drift week-by-week or month-by-month?

The devil is in the details. The answer will heavily depend on the model, use case, how easy (or possible) it is to retrain, and how much the performance drop costs you.

Here is a process that might help you think through it.

Let's look at historical data drift!

Why look at the past data drift?

Our goal is to learn drift dynamics. How much did our data change in the past?

This is useful for two reasons:

First, we can understand the model decay profile.

We shared a similar idea on how you could check for model retraining needs in advance. There, we looked at how model performance changes over time.

Now, we look at how the data changes. You can prefer one to another. Understanding data drift is especially helpful if we know we'll have to wait for the ground truth labels in production.

Recommended by LinkedIn

Flat Out: Introducing Level Hierarchies (4 of 6)

Jody Hesch 1 year ago

📊 Day 03: The Power of Number Theory in Data Science…

Muhammad Irfan 2 years ago

Data Science related articles I have read (w/c 06/12/21)

Simone Blackburn 4 years ago

Assuming that the data changes at some constant pace, we can use this analysis to set our expectations. And prepare the proper infrastructure.

Is the use case highly dynamic? Should we get ready for frequent retraining and build automatic pipelines?

Second, this analysis can help us define the model monitoring triggers.

We might want to check for data drift once the model is in production. How sensitive should our triggers be? What needs to happen for us to react to drift?

How can you approach this? We made a tutorial for that.

We explain:

How you can code your drift detection logic with a custom function
How you can run the experiments and visualize it
How you can log the results.

All using open-source tools!

Head here for a complete blog: https://evidentlyai.com/blog/tutorial-3-historical-data-drift

And a Jupyter notebook with all the code: https://github.com/evidentlyai/evidently/blob/main/evidently/tutorials/historical_drift_visualization.ipynb

To view or add a comment, sign in

How to detect, evaluate and visualize historical drifts in the data

Emeli Dral

Why look at the past data drift?

Recommended by LinkedIn

More articles by Emeli Dral

Others also viewed

Introduction to the Tidyverse

The Data Science Procedure

Why should I use TypeDB for my graph data?

Your data is not my data, and your model is not my model

Data Structures and Algorithm (DSA) – Performance, Complexity And Big-O Notation

🚀 Day 29 with Data Structures and Algorithms (DSA): Quick Sort 📊

Are you “Thinking with Data”?

A Shift in the way We Approach Solving Problems

To plot and what to plot?

Insights and visualization of real life data by a beginner in Data Science.

Explore content categories

Why look at the past data drift?

Recommended by LinkedIn

More articles by Emeli Dral

To retrain, or not to retrain? Let's get analytical about ML model updates

What Is Your Model Hiding? A Tutorial on Evaluating ML Models

New Release: Performance Reports for Classification Models in Production

How to break a model in 20 days. A tutorial on production model analytics

New Release: How To Analyze The Performance of Regression Models in Production?

New Release: Analyze Target and Prediction Drift in Machine Learning Models

Introducing Evidently 0.0.1 Release: Open-Source Tool To Analyze Data Drift

Machine Learning Monitoring, Part 4. How To Track Data Quality and Data Integrity

Machine Learning Monitoring, Part 3: What Can Go Wrong With Your Data?

Others also viewed

Introduction to the Tidyverse

The Data Science Procedure

Why should I use TypeDB for my graph data?

Your data is not my data, and your model is not my model

Data Structures and Algorithm (DSA) – Performance, Complexity And Big-O Notation

🚀 Day 29 with Data Structures and Algorithms (DSA): Quick Sort 📊

Are you “Thinking with Data”?

A Shift in the way We Approach Solving Problems

To plot and what to plot?

Insights and visualization of real life data by a beginner in Data Science.

Explore content categories