Understanding Concept Drift and Data Drift for Robust Machine Learning Models
Concept drift is a phenomenon wherein the statistical properties of the target variable (y), which the model aims to predict, undergo changes over time.
Data drift, often referred to as virtual drift, occurs when the statistical properties of the inputs change. In the presence of drift, models built on historical data become obsolete, necessitating a revision of model assumptions based on the current data. Figure 1 illustrates the distinctions between concept drift and virtual (data) drift.
At the business level, examples of drift manifest in various use cases. For instance:
Concept drift changes can take different forms:
Recommended by LinkedIn
Figure 2 illustrates how model drift detection functions. Initially, the system collects model inputs and outputs, calculates statistics over a time window, and compares them with either the sample set statistics saved during training or data from an older time window. The monitoring system saves various feature statistics and calculates the drift level using metrics such as Kolmogorov–Smirnov test, Kullback–Leibler divergence, Jensen–Shannon divergence, Hellinger distance, Standard score (Z-score), Chi-squared test, and Total variance distance
Helpful insight, thank you 🚀 I am experimenting with Detecting Silent ML Failures Before They Hurt Business — Project52.