Nuts and Bolts of Optimization

Nuts and Bolts of Optimization

Introduction

This article is a summary of an excellent presentation by Dr. Andrew Ng on Nuts and Bolts of Applied Deep Learning https://www.youtube.com/watch?v=F1ka6a13S9I

From the above lecture, I have summarized the key points for the AWS ML Specialty certification exam.

You will learn about common patterns and solutions to make progress when your model is not working as well as you expected

When the performance of a model is not good, you have all kinds of options in front of you

  • Collect more data
  • Train longer
  • Hyperparamter optimization
  • Use a different algorithm or architecture
  • Regularization
  • Bigger model

Selecting the right option(s) has a huge impact on how rapidly you make progress

Example

As an example, we will use a human-level speech recognition system

Our goal is to build a system that matches the human-level performance

So, get a dataset with a lot of samples

Split it into 70% for training, 15% for validation, and 15% for testing

As you build models, measure the following metrics:

  • Human-level error on the validation set (optimal error rate)
  • Training error
  • Validation error

These three numbers will tell you what to do next

Scenario 1

  • Human-level error = 1%
  • Training error = 5%
  • Validation error = 6%

In this scenario, the training and validation error is close. However, there is a big gap between human-level error and training error

Model is doing much worse than human-level performance

This gap in performance is called high-bias

Think of it this way: a biased person is disconnected from reality. 

A biased model does not meet real-world needs 

A common cause of this problem is under-fitting. The model is too simple to understand the complexity of data

Solution options for under-fitting:

  • Train a bigger model
  • Train longer
  • Try a new model architecture

No alt text provided for this image
No alt text provided for this image

Scenario 2

  • Human-level error = 1%
  • Training error = 2%
  • Validation error = 6%

In this scenario, human-level and training error are close. However, the model is doing much worse with the validation set

This gap in validation performance is called high-variance.

A common cause of this problem is over-fitting. The model memorized too-much about the training data and not generalizing well. So, we need to simplify the model.

Solution options for over-fitting:

  • Add regularization
  • Try early stopping
  • Get more data (when training and validation data are not from the same distribution, the validation error will be more) 
  • Try a new model architecture

No alt text provided for this image
No alt text provided for this image

Scenario 3

  • Human-level error = 1%
  • Training error = 5%
  • Validation error = 10%

In this scenario, the model is performing much worse than human-level performance. 

So, it has high-bias. Besides, validation performance is much worse than the training error.

This model has high-bias and high-variance

Scenario 4

  • Human-level error = 1%
  • Training error = 5%
  • Validation error = 6%
  • Test error = 10%

In production machine learning today, it is much more common for your training and test data to come from different data distributions

Your development team will fine-tune the model to improve validation performance. And when you try with the test set, you will see a much worse performance due to the distribution difference. A lot of tuning work is wasted.

To prevent this, make sure validation and test set come from the same distribution

Human-Level Performance

How do you define human-level performance?

Let’s look a medical example error

  • Typical human error = 3%
  • Typical doctor = 1%
  • Expert doctor = 0.7%
  • Team of expert doctors = 0.5%

Which of these is the most useful definition of human-level error?

A team of expert doctors is a good definition of human-level performance, and it is the optimal error rate that we should strive for

Workflow

Here is a workflow to handle these conditions and make progress

No alt text provided for this image

Data Synthesis

You can generate synthetic data, and for some problems, data synthesis can significantly improve the performance of the model

Here are some examples of data synthesis

Optical Character Recognition System

For OCR, how do you generate synthetic data? Here is one option

Download random picture from the internet

Use software like MS Word, use a random font, and a random word from the English dictionary

Paste the word in the random picture – you have just synthesized new data

However, in reality, you need to blend the word with image - blur, adjust color contrast, and so forth

So, it can be a lot of work to fine-tune. Once you fine-tune, you have an unlimited source of data

Speech recognition

For a speech recognition system, how do you synthesize new data?

Take clean audio of a person - relatively noiseless audio

Take random background sounds and mix with clean audio

You now have an audio sample of how a person's voice would sound in the presence of a background noise

This provides you with an unlimited source of new data

For example, mix clean audio and car noise. The audio clip sounds like a person talking in the car

Use this data for training

Unified Data Warehouse

With a consolidated data warehouse (or a Data Lake), the access to the data is much smoother and allows the team to make progress

Discuss user access rights, privacy

To Enroll

AWS SageMaker - Certified Machine Learning Specialty Exam

https://www.udemy.com/course/aws-machine-learning-a-complete-guide-with-python/?referralCode=9ADB4395937F7D656EB9

Related Videos

For a videos related to this topic, please watch HyperParameter Tuning, Bias-Variance, Regularization (L1, L2) lecture in the XGBoost Section of this course

For Hyperparameter and Tuning, view lectures in the Model Optimization and HyperParameter Tuning section

For data warehouse and consolidation, please see the Data Lake section of this course

References

NIPS 2016 tutorial: "Nuts and bolts of building AI applications using Deep Learning" by Andrew Ng

https://www.youtube.com/watch?v=wjqaz6m42wU (long video-better audio quality)

https://www.youtube.com/watch?v=F1ka6a13S9I (shorter version)

It's a good overview tying some of the myriad names for things together. Bias = Underfit. Variance = Overfit.

Like
Reply

Brilliant. That one missing thing in your lecture and you covered it here.

Like
Reply

To view or add a comment, sign in

More articles by Chandra Mohan Lingam

  • EC2 Instance Login - Comparison between the keypair, EC2 Instance Connect, and Session Manager (Systems Manager)

    AWS provides three options to log on to your EC2 instances: Traditional SSH Keypair-based login, EC2 Instance Connect…

    4 Comments
  • Save Money on SageMaker Usage

    There are three different ways to save some money on your SageMaker usage. · SageMaker Savings Plan for steady-use…

  • World’s largest migration

    Recent changes to Whatsapp privacy policies have triggered a lot of discussion in Social media. This picture…

  • Recommender Systems using Factorization Machines

    Recommender System is a type of machine learning problem that has some interesting characteristics: Recommendations are…

  • Data breach

    A look at the recent attack and how to protect your application Capital One was hacked in 2019, and the data breach…

    1 Comment
  • [Updated Dec 2020] - My experience taking AWS SAA C02, and C01 exams

    [updated] When I took the SAA-C02 exam in May-2020, I was not too happy with the questions' quality. This is what I…

    1 Comment
  • AWS SAA-C02 Updates

    I have a major AWS SAA-C02 content update for this month to my courses. Global Accelerator Global Accelerator is a…

  • Cloud Security in AWS

    Learn the current best practices in managing cloud security. When to use identity-based and resource-based policies…

  • Keeping up with AWS Changes

    Frequent questions I get on AWS are: "How do I keep up with all the changes happening in AWS?" "Do I have to know all…

    1 Comment
  • AWS Tip: KMS & Envelope Encryption

    Encryption is used for protecting your data at rest and during transit. Security here hinges on properly protecting the…

Others also viewed

Explore content categories