From the course: Deep Learning with Python: Optimizing Deep Learning Models

Unlock this course with a free trial

Join today to access over 25,500 courses taught by industry experts.

Adaptive Delta (AdaDelta)

Adaptive Delta (AdaDelta)

- [Instructor] Adaptive Delta, commonly known as AdaDelta, addresses AdaGrad's diminishing learning rate problem by restricting the window of accumulated past gradients to a fixed size. Instead of accumulating all past squared gradients, AdaDelta uses a moving average of the gradients, similar to RMSprop, however, it goes a step further by adapting the update step size, effectively, eliminating the need for a default learning rate. AdaDelta adapts learning rates based on the moving window of gradient updates, addressing the diminishing learning rate issue observed in AdaGrad. By focusing on recent gradients, it maintains a consistent learning rate throughout training, facilitating better convergence. Additionally, AdaDelta performs well on problems with sparse gradients, similar to AdaGrad, making it suitable for various applications in natural language processing on other domains where data sparsity is a concern. While AdaDelta addresses some of the limitations of AdaGrad, it…

Contents