Maximizing Ensemble Diversity in Deep Reinforcement Learning

Hassam Sheikh

Published Apr 26, 2022

Co-authors for this project are Mariano Phielipp from Intel Labs and Ladislau Boloni from University of Central Florida.

Key Messages:

Intel Labs, in collaboration with University of Central Florida proposed a new set of regularization techniques called Maximize Ensemble Diversity in Reinforcement Learning.
We integrated MED-RL in TD3[1], SAC[2] and REDQ[3] for continuous control tasks and in MaxminDQN[4] and EnsembleDQN[5] for discrete control tasks and evaluated on six Mujoco environments and six Atari games.
Our results show that MED-RL augmented algorithms outperform their un-regularized counterparts significantly and, in some cases, achieved more than 300% in performance gains and are up to 75% more sample-efficient.

The 2022 International Conference on Learning Representations (ICLR) runs April 25th through 29th. Specifically dedicated to the improvement of representation learning (RL), or deep learning, ICLR is the leading gathering of professionals presenting cutting-edge research on all aspects of RL and its diverse applications. Our work, Maximizing Ensemble Diversity in Deep Reinforcement Learning, will be presented in this year’s conference in Poster Session 5.

Ensemble Reinforcement Learning

Ensemble reinforcement learning is a “method of combining learning models to produce a single learner to perform inference on the data.” This method is gaining popularity because of its ability to address some long-standing training challenges such as sample efficiency, exploration, and high estimated bias. Therefore, it is considered the go-to method for trial-and-error style learning. However, training the ensemble of neural networks with same data causes the network collapse problem where all the networks start giving identical outputs. Thus, losing all leverage of an ensemble.

Intel Labs, in collaboration with University of Central Florida proposed Maximize Ensemble Diversity in Reinforcement Learning (MED-RL), a set of regularization techniques inspired from economic theory that maximizes diversity between neural networks by promoting inequality between the network parameters. This regularization allows the ensemble reinforcement learning algorithms to harness their maximum potential.

Does Network Collapse Affect Performance?

Our work started with the conjecture that high similarity between the neural networks correlates to poor performance. To verify our hypothesis, we trained a MaxminDQN agent with 2 networks for 3000 episodes. The training graph along with the similarity heatmaps are shown below in Figure 1. Notably at episode 500 (heatmap A) and episode 2000 (heatmap C), the representation similarity between neural networks is low but the average return is relatively high. In contrast, at episode 1000 (heatmap B) and episode 3000 (heatmap D) the representation similarity is highest, but the average return is lowest.

Recommended by LinkedIn

Generative Adversarial Networks – An Overview

Shirshendu Halder 4 years ago

How to Implement Convolutional Variational Autoencoder…

Harshit Goyal 3 years ago

Twin Delayed Deep Deterministic Reinforcement learning…

Abram George 1 year ago

Figure 1: The training graph and similarity heatmaps of a MaxminDQN agent with 2 neural networks. The letters on the plot show the time when similarities were calculated. Heatmaps at A and C have relatively low similarity and have relatively higher average return as compared to heatmaps at point B and D that have extremely high similarity across all the layers. See diagonal values from bottom left to top right.

Different Weight Initialization for Diversity in Ensemble

The most popular approach to address the network collapse problem is the different weight initialization. To test this approach, we performed a toy experiment where we trained two different architectures with different learning rates and batch sizes. We found that neural networks initialized with different weights learn almost identical functions. Figure 2a shows the learnt functions while Figure 2b represents their similarity heatmap before and after training. In Figure 2b, you can see that the output of the trained networks was 98% similar. Therefore, this method is not suitable for promoting diversity between neural networks on an ensemble.

Figure 2. Left: Fitting a sine function using two different neural network architectures. The upper function was approximated using 64 neurons in each hidden layer while the lower function used 32 neurons in each hidden layer. Right: Represents the similarity heatmap between different layers of both neural networks before and after training. The right diagonal (bottom left to top right) measures representation similarity of the corresponding layers of both neural networks.

For our approach, we integrated MED-RL in TD3[1], SAC[2] and REDQ[3] for continuous control tasks and in MaxminDQN[4] and EnsembleDQN[5] for discrete control tasks and evaluated on six Mujoco environments and six Atari games. We did this with the goal of integrating the regularizers to maximize the diversity between the neural networks. Our results show that MED-RL augmented algorithms outperform their un-regularized counterparts significantly and, in some cases, achieved more than 300% in performance gains and are up to 75% more sample-efficient. A sample of the results is shown below in Table 1. As demonstrated, this proposed set of regulation techniques, MED-RL, successfully maximizes the diversity between the networks of the ensemble. Furthermore, the sample-efficiency benefits of MED-RL, suggest that it can prove to be a useful tool in the field of robotics where data gathering is an expensive process.

Table 1. Max Average Return for MED-RL SAC over 5 trials of 1 million time steps. Maximum value for each task is bolded. ± corresponds to a single standard deviation over trials.

References:

Scott Fujimoto, Herke Van Hoof, and David Meger. Addressing function approximation error in actor-critic methods.
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor.
Xinyue Chen, Che Wang, Zijian Zhou, and Keith W. Ross. Randomized ensembled double q-learning: Learning fast without a model.
Qingfeng Lan, Yangchen Pan, Alona Fyshe, and Martha White. Maxmin Q-learning: Controlling the estimation bias of Q-learning
Oron Anschel, Nir Baram, and Nahum Shimkin. Averaged-DQN: Variance reduction and stabilization for deep reinforcement learning.

Lotzi Bölöni 4y

Great job, Hassam!

1 Reaction

Somdeb Majumdar 4y

Great job Hassam. Keep crushing it!

1 Reaction

See more comments

To view or add a comment, sign in

Maximizing Ensemble Diversity in Deep Reinforcement Learning

Hassam Sheikh

Key Messages:

Ensemble Reinforcement Learning

Does Network Collapse Affect Performance?

Recommended by LinkedIn

Different Weight Initialization for Diversity in Ensemble

References:

Others also viewed

Mastering Backgammon with Neural Networks: The Tale of TD-Gammon 🎲🧠

Probabilistic graphical models for Deep Learning Part-1 (Restricted Boltzmann Machines)

GENERATIVE ADVERSARIAL NETWORK (GAN)

DEEP LEARNING

An Intro to Reinforcement Learning through Flappy Bird

Deep Learning from 30,000 feet

DRL in Deep Learning

Deep Learning Neural Networks frameworks: a focus on hardware accelerators

Deep Learning using Recurrent Neural Networks

Unlocking the Power of Deep Learning: An Insight into Residual Networks (ResNet)

Explore content categories

Key Messages:

Ensemble Reinforcement Learning

Does Network Collapse Affect Performance?

Recommended by LinkedIn

Different Weight Initialization for Diversity in Ensemble

References:

Others also viewed

Mastering Backgammon with Neural Networks: The Tale of TD-Gammon 🎲🧠

Probabilistic graphical models for Deep Learning Part-1 (Restricted Boltzmann Machines)

GENERATIVE ADVERSARIAL NETWORK (GAN)

DEEP LEARNING

An Intro to Reinforcement Learning through Flappy Bird

Deep Learning from 30,000 feet

DRL in Deep Learning

Deep Learning Neural Networks frameworks: a focus on hardware accelerators

Deep Learning using Recurrent Neural Networks

Unlocking the Power of Deep Learning: An Insight into Residual Networks (ResNet)

Similar topics

How Ensemble Learning Improves Predictions

Challenges and Benefits of Deep Learning in AI

Explore content categories