The Use of Regularization in the Context of Overfitting in Machine Learning
Over the course of the past 2 months I have been at work at Atlas School learning about Machine Learning! I've recently taken some notes on regularization, specifically techniques used to combat Overfitting. They may be useful for your own Machine Learning journey!
When the model works really, really well with it's current data set, but cannot work with new sets due to various issues, such as insufficient training data, irrelevant or misleading training data, training sessions that allow the machine to memorize patterns (hampering it's ability to learn new ones,) and the like. The topics as follows reflect measurements and strategies aimed at minimizing and mitigating the issue of overfitting.
One potential example that we may recognize is the prevalent issue with modern AI's inability to distinguish among minorities in it's implementation of facial recognition. This is likely due to the training data not containing enough faces of minorities, leading to false positives at an extended rate for said races. Unfortunately these programs are used for surveillance and security, and thus has led to many wrongful arrests.
L1 regularization (AKA Lasso Regularization) encourages machines to set their coefficients to zero, by adding the Absolute Value of Magnitude to the loss function. The modified loss function is calculated with the formula below:
This limits the machine's processing of irrelevant and misleading data, and rather "zooms in" to specific parts of the data, allowing for feature selection.
L2 regularization (AKA Ridge Regression) works by limiting the variance of the coefficients. This is useful for colinear and codependent features. It adds the Squared Magnitude as calculated below:
It encourages our machine to find a balance of all features, or in other words, a Libra.
Dropout is a method that involves ignoring various layers. Within a chosen layer, it will "turn off" randomly determined neurons at random epochs. To make up for the limit of various neurons, the remaining neurons' outputs are scaled in importance. This prevents neurons from becoming hyper-specific, allowing models to generalize unseen data. The effects could be considered similar to the concept of expanding neuroplasticity in humans, or perhaps increasing the ability to problem solve by randomly strengthening remaining neurons.
Recommended by LinkedIn
Data Augmentation is the creation of new data sets from old ones, aka synthetic data sets. It's like re-using play-doh after you're done mixing different sets of colors together. This is done through various techniques on old training data, such as altering spatial properties, adjusting various shades and hues, and creating imperfections. This can be an affective process in that it increases the number of True Negatives within the output. This method was useful back in the day (the 90's) when datasets were limited. Some downsides are that QA is expensive but unfortunately necessary when working with input, as well as the fact that Research and Development in this process is also imperative.
Early Stopping is exactly what it sounds like. It detects when the output is deteriorating, by working with a validation dataset. Some ways that you can calculate when to stop is if you (or rather your program) may see an increase in False Negatives, if there is no change in a metric over a specific number of epochs, if there is an absolute change in a metric, a decrease in performance, or an average change in a metric. Using this process you will make more use of the training data used. This limits the amount of epochs necessary, and thusly limits the amount of time needed in order to reach a finished product.
This is my current understanding of the topics, I understand much of it may be surface level. I look forward to revisiting this topic in the future!
Sources:
https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c https://www.geeksforgeeks.org/dropout-regularization-in-deep-learning/
https://www.datacamp.com/tutorial/complete-guide-data-augmentation https://www.geeksforgeeks.org/regularization-by-early-stopping/
You make such a great point about the problem of unintentional bias in the datasets used for training facial recognition. The impact on minorities is particularly poignant - it really underscores the importance of diverse, representative datasets in building systems for good!