Nuts and Bolts of Optimization

Chandra Mohan Lingam

Published Nov 23, 2020

Introduction

This article is a summary of an excellent presentation by Dr. Andrew Ng on Nuts and Bolts of Applied Deep Learning https://www.youtube.com/watch?v=F1ka6a13S9I

From the above lecture, I have summarized the key points for the AWS ML Specialty certification exam.

You will learn about common patterns and solutions to make progress when your model is not working as well as you expected

When the performance of a model is not good, you have all kinds of options in front of you

Collect more data
Train longer
Hyperparamter optimization
Use a different algorithm or architecture
Regularization
Bigger model

Selecting the right option(s) has a huge impact on how rapidly you make progress

Example

As an example, we will use a human-level speech recognition system

Our goal is to build a system that matches the human-level performance

So, get a dataset with a lot of samples

Split it into 70% for training, 15% for validation, and 15% for testing

As you build models, measure the following metrics:

Human-level error on the validation set (optimal error rate)
Training error
Validation error

These three numbers will tell you what to do next

Scenario 1

Human-level error = 1%
Training error = 5%
Validation error = 6%

In this scenario, the training and validation error is close. However, there is a big gap between human-level error and training error

Model is doing much worse than human-level performance

This gap in performance is called high-bias

Think of it this way: a biased person is disconnected from reality.

A biased model does not meet real-world needs

A common cause of this problem is under-fitting. The model is too simple to understand the complexity of data

Solution options for under-fitting:

Train a bigger model
Train longer
Try a new model architecture

Scenario 2

Human-level error = 1%
Training error = 2%
Validation error = 6%

In this scenario, human-level and training error are close. However, the model is doing much worse with the validation set

This gap in validation performance is called high-variance.

A common cause of this problem is over-fitting. The model memorized too-much about the training data and not generalizing well. So, we need to simplify the model.

Solution options for over-fitting:

Add regularization
Try early stopping
Get more data (when training and validation data are not from the same distribution, the validation error will be more)
Try a new model architecture

Scenario 3

Human-level error = 1%
Training error = 5%
Validation error = 10%

In this scenario, the model is performing much worse than human-level performance.

So, it has high-bias. Besides, validation performance is much worse than the training error.

This model has high-bias and high-variance

Scenario 4

Human-level error = 1%
Training error = 5%
Validation error = 6%
Test error = 10%

In production machine learning today, it is much more common for your training and test data to come from different data distributions

Human-Level Performance

How do you define human-level performance?

Let’s look a medical example error

Typical human error = 3%
Typical doctor = 1%
Expert doctor = 0.7%
Team of expert doctors = 0.5%

Which of these is the most useful definition of human-level error?

A team of expert doctors is a good definition of human-level performance, and it is the optimal error rate that we should strive for

Workflow

Here is a workflow to handle these conditions and make progress

Data Synthesis

You can generate synthetic data, and for some problems, data synthesis can significantly improve the performance of the model

Here are some examples of data synthesis

Optical Character Recognition System

For OCR, how do you generate synthetic data? Here is one option

Download random picture from the internet

Use software like MS Word, use a random font, and a random word from the English dictionary

Paste the word in the random picture – you have just synthesized new data

However, in reality, you need to blend the word with image - blur, adjust color contrast, and so forth

So, it can be a lot of work to fine-tune. Once you fine-tune, you have an unlimited source of data

Speech recognition

For a speech recognition system, how do you synthesize new data?

Take clean audio of a person - relatively noiseless audio

Take random background sounds and mix with clean audio

You now have an audio sample of how a person's voice would sound in the presence of a background noise

This provides you with an unlimited source of new data

For example, mix clean audio and car noise. The audio clip sounds like a person talking in the car

Use this data for training

Unified Data Warehouse

With a consolidated data warehouse (or a Data Lake), the access to the data is much smoother and allows the team to make progress

Discuss user access rights, privacy

To Enroll

AWS SageMaker - Certified Machine Learning Specialty Exam

https://www.udemy.com/course/aws-machine-learning-a-complete-guide-with-python/?referralCode=9ADB4395937F7D656EB9

References

NIPS 2016 tutorial: "Nuts and bolts of building AI applications using Deep Learning" by Andrew Ng

https://www.youtube.com/watch?v=wjqaz6m42wU (long video-better audio quality)

https://www.youtube.com/watch?v=F1ka6a13S9I (shorter version)

Tony Wenzel 3y

It's a good overview tying some of the myriad names for things together. Bias = Underfit. Variance = Overfit.

SHASHANK K 3y

Nice summary.

Praveen K. 5y

Brilliant. That one missing thing in your lecture and you covered it here.

See more comments

To view or add a comment, sign in