From the course: Deep Learning with Python: Optimizing Deep Learning Models

Unlock this course with a free trial

Join today to access over 25,500 courses taught by industry experts.

Adaptive Moment Estimation (Adam)

Adaptive Moment Estimation (Adam)

- [Instructor] ADAM, which stands for Adaptive Momentum Estimation, combines the ideas of RMSProp and momentum. It maintains exponentially-decaying averages of past gradients, known as first moment estimates, and squared gradients, known as second moment estimates. By doing so, ADAM adapts the learning rate for each parameter and incorporates momentum to accelerate convergence and smooth out the optimization path. Momentum can be thought of as adding inertia to the optimization process. Instead of making updates solely based on the current gradient, it also considers the direction and magnitude of recent updates. This allows the optimization process to maintain momentum in the direction of consistent gradient descent. ADAM is designed to be competitionally efficient, have little memory requirements, and be well-suited for problems with large data and parameters. It has become one of the most popular optimizers in deep learning due to its robust performance across a wide range of…

Contents