Nuts and Bolts of Optimization
Introduction
This article is a summary of an excellent presentation by Dr. Andrew Ng on Nuts and Bolts of Applied Deep Learning https://www.youtube.com/watch?v=F1ka6a13S9I
From the above lecture, I have summarized the key points for the AWS ML Specialty certification exam.
You will learn about common patterns and solutions to make progress when your model is not working as well as you expected
When the performance of a model is not good, you have all kinds of options in front of you
Selecting the right option(s) has a huge impact on how rapidly you make progress
Example
As an example, we will use a human-level speech recognition system
Our goal is to build a system that matches the human-level performance
So, get a dataset with a lot of samples
Split it into 70% for training, 15% for validation, and 15% for testing
As you build models, measure the following metrics:
These three numbers will tell you what to do next
Scenario 1
In this scenario, the training and validation error is close. However, there is a big gap between human-level error and training error
Model is doing much worse than human-level performance
This gap in performance is called high-bias
Think of it this way: a biased person is disconnected from reality.
A biased model does not meet real-world needs
A common cause of this problem is under-fitting. The model is too simple to understand the complexity of data
Solution options for under-fitting:
Scenario 2
In this scenario, human-level and training error are close. However, the model is doing much worse with the validation set
This gap in validation performance is called high-variance.
A common cause of this problem is over-fitting. The model memorized too-much about the training data and not generalizing well. So, we need to simplify the model.
Solution options for over-fitting:
Scenario 3
In this scenario, the model is performing much worse than human-level performance.
So, it has high-bias. Besides, validation performance is much worse than the training error.
This model has high-bias and high-variance
Scenario 4
In production machine learning today, it is much more common for your training and test data to come from different data distributions
Recommended by LinkedIn
Your development team will fine-tune the model to improve validation performance. And when you try with the test set, you will see a much worse performance due to the distribution difference. A lot of tuning work is wasted.
To prevent this, make sure validation and test set come from the same distribution
Human-Level Performance
How do you define human-level performance?
Let’s look a medical example error
Which of these is the most useful definition of human-level error?
A team of expert doctors is a good definition of human-level performance, and it is the optimal error rate that we should strive for
Workflow
Here is a workflow to handle these conditions and make progress
Data Synthesis
You can generate synthetic data, and for some problems, data synthesis can significantly improve the performance of the model
Here are some examples of data synthesis
Optical Character Recognition System
For OCR, how do you generate synthetic data? Here is one option
Download random picture from the internet
Use software like MS Word, use a random font, and a random word from the English dictionary
Paste the word in the random picture – you have just synthesized new data
However, in reality, you need to blend the word with image - blur, adjust color contrast, and so forth
So, it can be a lot of work to fine-tune. Once you fine-tune, you have an unlimited source of data
Speech recognition
For a speech recognition system, how do you synthesize new data?
Take clean audio of a person - relatively noiseless audio
Take random background sounds and mix with clean audio
You now have an audio sample of how a person's voice would sound in the presence of a background noise
This provides you with an unlimited source of new data
For example, mix clean audio and car noise. The audio clip sounds like a person talking in the car
Use this data for training
Unified Data Warehouse
With a consolidated data warehouse (or a Data Lake), the access to the data is much smoother and allows the team to make progress
Discuss user access rights, privacy
To Enroll
AWS SageMaker - Certified Machine Learning Specialty Exam
Related Videos
For a videos related to this topic, please watch HyperParameter Tuning, Bias-Variance, Regularization (L1, L2) lecture in the XGBoost Section of this course
For Hyperparameter and Tuning, view lectures in the Model Optimization and HyperParameter Tuning section
For data warehouse and consolidation, please see the Data Lake section of this course
References
NIPS 2016 tutorial: "Nuts and bolts of building AI applications using Deep Learning" by Andrew Ng
https://www.youtube.com/watch?v=wjqaz6m42wU (long video-better audio quality)
https://www.youtube.com/watch?v=F1ka6a13S9I (shorter version)
It's a good overview tying some of the myriad names for things together. Bias = Underfit. Variance = Overfit.
Nice summary.
Brilliant. That one missing thing in your lecture and you covered it here.