Concept Drift: A Pitfall of Machine Learning You Need to Know About

Kunal Deore

Published Sep 25, 2022

In today’s fast-paced world of machine learning and AI, it’s crucial for data scientists to be as agile and nimble as possible. After all, the pace of business is ever accelerating. But what happens when your model ceases to give accurate predictions? Even the best ML models can suffer from a phenomenon known as concept drift—when your model starts providing different outputs over time.

Wrong predictions not only cost you hours of research and testing but also signal that your model has drifted away from its original purpose. In this blog post, you will learn what concept drift is, why it happens in machine learning models, examples of concept drift in real life, and steps you can take to prevent it from happening in your own ML models.

What is Concept Drift?

Concept drift refers to a phenomenon in machine learning that causes a model to shift its focus away from its original task. The model’s underlying understanding of the problem at hand might change over time as it encounters change in environment or dependent parameter even if input data did not change much.

In the real world concepts are often not stable but change with time. Typical examples of this are weather prediction rules and customers’ preferences. The underlying data distribution may change as well. Often these changes make the model built on old data inconsistent with the new data, and regular updating of the model is necessary. This problem, known as concept drift -Alexey Tsymbal

There are different concept drift that can occur.

Image Source : https://www.kdnuggets.com/2019/12/ravages-concept-drift-stream-learning-applications.html

Let's take a look at an example of abrupt drift in face ID to get a better understanding of abrupt drift. Face identification models that are trained without masks will not be able to identify the same person wearing a mask if the same model is trained without masks.

With industrial process control, gradual and incremental changes can be visualized as a step by step process. The importance of this in industrial process control cannot be overstated, since even if a model receives the same input data, the underlying state of the machine may change due to wear and tear of moving parts, or changes in viscosity of lubricants, or any number of other factors, among which there are many.

The following diagram shows how the process control has been optimized using the Constrained Optimization techniques applied to the simulation model of the actual process. Please read my article on Constrained Optimization Techniques for more details. A simulation model receives sensor data in real time and attempts to mimic what happens in the real world in order to mimic the input. The optimizer will display the correct control until the simulation mimics the real world, but as soon as some changes take place in the state of the machine, due to wear and tear of the hardware, we will not get optimal control, which can lead to fetal defects.

Image Source : https://towardsdatascience.com/ai-for-industrial-process-control-intro-to-control-strategies-part-1-b855fb6df61d

With the seasonality of Time series forecasting models, recurrent drift can be observed.

A model normally drifts if it’s not properly monitored and doesn’t receive feedback from users or environment where it is deployed.

Concept drift can result in poor model performance or incorrect results. Data scientists can minimize the risk of concept drift by monitoring their models and making adjustments as needed. The first step to preventing concept drift is to recognize the signs of a model drifting away from its original task.

How Does Concept Drift Occur?

When building a model, data scientists often make assumptions about the data they’ll use. They might assume that data will remain relatively consistent over time, or that it will contain certain patterns. If assumptions doesn’t meet these expectations, it could cause a model to drift away from its original task.

Recommended by LinkedIn

Beyond Accuracy: Mastering Advanced Evaluation Metrics…

Sanjay kumar M. 5 months ago

The Power of Machine Learning: Transforming the Future

Mohammed jaasir 1 year ago

The Strategic Approach to Building Machine Learning…

Jonathan Lwowski 2 years ago

How to Identify Concept Drift

There are several steps you can take to prevent concept drift in your machine learning models. First, build in monitoring and adjust as needed based on results. This can include regular model retraining and assessing your data to see if it changes over time. When using neural networks, data scientists should also monitor the Hyperparameters used.

This is sample workflow from Azur Databricks to monitor ML models.

Image Source: https://techcommunity.microsoft.com/t5/ai-machine-learning-blog/machine-learning-at-scale-with-databricks-and-kubernetes/ba-p/3056539

The monitoring system has been implemented, but what is the best way to determine if the concept is drifting or not?

Various techniques can be used to evaluate drift in key performance indicators of the models like Accuracy, F1 score etc. Vaishali Suryawanshi in her paper "A_Review_on_Concept_Drift" says “algorithms allowing to detect concept drift, known as concept drift detectors” and extend on various algorithms Drift Detection Method (DDM), proposed by Gama et al. uses Binomial Distribution , EDDM a modification of DDM , Adwin which uses sliding windows of variable size, which are recomputed online according to the rate of change observed from the data in these windows.

How to prevent Concept Drift

Model Retraining:

It is up to the MLOP's team to decide what the model performance KPI's and threshold are, after failing to meet these thresholds, retraining can be performed to help boost the performance.

2. Stream learning/ Incremental Learning:

Image Source: https://www.researchgate.net/figure/Incremental-learning_fig4_321825726

In Incremental learning the input data is continuously used to extend the existing model's knowledge i.e. to further train the model. It will help reduce concept drift as model will continuously adapt the changing real world environment. Scikit-Multiflow support Stream learning , please visit the documentation here to explore this further.

3. Multi-Objective Reinforcement Learning :

This is the algorithm that combines concept drift learning with multi-objective reinforcement learning to produce an unsupervised technique for learning in non-stationary multiple objective environments.

Real world environments are non-stationary and include unanticipated changes that contradict what an agent has previously learned. In machine learning, this is called concept drift, a partially-observable change when the environment modifies without notification. Concept drift causes several problems agents with a decaying exploration fail to adapt, while agents capable of adapting may over xCt to noise or overwrite previously learned knowledge. Agents in such environments must take steps to mitigate both problems. This problem is compounded because agents typically have multiple tasks to accomplish simultaneously, requiring multi-objective optimization - Webber, Frederick C.

What's Next

There’s still more to learn about concept drift in machine learning. You can investigate how different algorithms respond to concept drift and how they can be corrected. You can also further investigate ways to prevent concept drift in your models and how to respond if a model does drift away from its original task. There are many fascinating aspects of machine learning, and concept drift is just one of them.

Rupa Kumari 3y

Awesome explanation, I'll be waiting for the next article along with an example of the algorithm respond 😊

To view or add a comment, sign in

Concept Drift: A Pitfall of Machine Learning You Need to Know About

Kunal Deore

Recommended by LinkedIn

How to prevent Concept Drift

More articles by Kunal Deore

Others also viewed

The balance between Machine Learning and the Human Touch

Machine learning and human creativity

🧠 Machine Learning is Not a Solution for All Problems 🚫

Regularization in Machine Learning (ML): Why Simpler Models Often Win

Issue#38: Machine Learning Edition - Machine Learning is not Intelligence. It is structured learning from data.

Who wins: System Dynamics or Machine Learning?

🤖 Machine Learning: Transforming the Way We Work and Innovate

Machine Learning Retraining

Explore content categories

Recommended by LinkedIn

How to prevent Concept Drift

More articles by Kunal Deore

AI can simulate—but not instantiate consciousness. That gap is where governance must operate.

Context is the New Code: The Essential Skill for a World Run by AI

The Role of Graph Neural Networks in Supply Chain

Vulnerabilities in Machine Learning Applications and How to Protect Yourself

Others also viewed

The balance between Machine Learning and the Human Touch

Machine learning and human creativity

🧠 Machine Learning is Not a Solution for All Problems 🚫

Regularization in Machine Learning (ML): Why Simpler Models Often Win

Issue#38: Machine Learning Edition - Machine Learning is not Intelligence. It is structured learning from data.

Who wins: System Dynamics or Machine Learning?

🤖 Machine Learning: Transforming the Way We Work and Innovate

Machine Learning Retraining

Similar topics

Understanding Model Drift In Machine Learning Applications

Common Pitfalls In AI Predictive Maintenance Projects

Explore content categories