How to detect, evaluate and visualize historical drifts in the data
How to detect, evaluate and visualize historical drifts in the data

How to detect, evaluate and visualize historical drifts in the data

We often talk about detecting drift on live data.

The goal is then to check if the current distributions deviate from training or some past period. When drift is detected, you know that your model operates in a relatively unknown space. It's time to do something about it.

But there are a lot of nuances here. How big of drift is DRIFT? Should I care if only 10% of my features drifted? Should I look at drift week-by-week or month-by-month?

The devil is in the details. The answer will heavily depend on the model, use case, how easy (or possible) it is to retrain, and how much the performance drop costs you.

Here is a process that might help you think through it.

Let's look at historical data drift!

Why look at the past data drift?

Our goal is to learn drift dynamics. How much did our data change in the past?

This is useful for two reasons:

First, we can understand the model decay profile.

ML model quality in time

We shared a similar idea on how you could check for model retraining needs in advance. There, we looked at how model performance changes over time.

Now, we look at how the data changes. You can prefer one to another. Understanding data drift is especially helpful if we know we'll have to wait for the ground truth labels in production.

Assuming that the data changes at some constant pace, we can use this analysis to set our expectations. And prepare the proper infrastructure.

Is the use case highly dynamic? Should we get ready for frequent retraining and build automatic pipelines?

Second, this analysis can help us define the model monitoring triggers.

We might want to check for data drift once the model is in production. How sensitive should our triggers be? What needs to happen for us to react to drift?

No alt text provided for this image

How can you approach this? We made a tutorial for that. 

We explain:

  • How you can code your drift detection logic with a custom function
  • How you can run the experiments and visualize it
  • How you can log the results.

All using open-source tools!

Head here for a complete blog: https://evidentlyai.com/blog/tutorial-3-historical-data-drift

And a Jupyter notebook with all the code: https://github.com/evidentlyai/evidently/blob/main/evidently/tutorials/historical_drift_visualization.ipynb 



To view or add a comment, sign in

More articles by Emeli Dral

Others also viewed

Explore content categories