Machine Learning Model Deployment Challenges: Data Drift and Concept Drift Examples
Your Machine Learning model does well on the training set, then, it also does well on the test set. It's time to celebrate. But it's not all over yet. Machine Learning model deployment has its many challenges.
Talking about the statistical challenges include Data Drift and Concept Drift.Lets understand through examples.
1) Let's say you're building a computer vision system to detect cracks in parts being manufactured in a workshop or factory unit. The model detects cracks in the part and based on that the Mechanical Engineers accept or reject the part or accept it with an overhaul based on the size of the crack (defect)
Typically in the manufacturing unit one may have a device with software and cameras which will take the pictures of the part manufactured and send it to the server (following some API call) where the ML model is deployed. The Machine Learning model makes the prediction and sends back the result to the software through some API interface. You have trained the model based on n such data and your model performed well you deployed it
Now, there are some lighting changes in the manufacturing unit due to which the pictures come out differently. Such changes mean there is a change in distribution of data. The lighting effect might not have been included in your training or test data. Due to this change the model might fail to perform well. This is Data Drift. The data distribution has changed
Recommended by LinkedIn
2) Concept Drift occurs when there is change in the relationship between dependent and independent variables.
Let's say you have some material like steel or aluminum and the material is going to make up a part.
Material Specialists test the material specimen in the laboratory, get results of tests in tension , compression and torsion.
ML Engineer fits a regression model based on the data supplied by Material Specialists to represent the test data correctly. Simulation Mechanical Engineers carry out simulations of a component (which is made of the material) using the Machine Learning model supplied by ML Engineers. It turns out that component starts cracking under loads (in service).
The reason for cracking (after investigation) turned out that material behaviour under multi axial state of stress (i e. combined tension and torsion or combination of states of stress) was different than pure uni-axial states. This is concept drift because the mapping of variables is different since there is conceptual shift from uni-axial state to a combined state of stress
3) Simpler example of Concept Drift could be change in prices of property in cases you have made a ML model for predicting housing prices based in several features like size, number of bedrooms, vicinity utilities and so on. But over time the prices have increased due to various developments. Hence making model retraining important