Rainbow: Combining Improvements in Deep Reinforcement Learning

  title={Rainbow: Combining Improvements in Deep Reinforcement Learning},
  author={Matteo Hessel and Joseph Modayil and H. V. Hasselt and Tom Schaul and Georg Ostrovski and Will Dabney and Dan Horgan and Bilal Piot and Mohammad Gheshlaghi Azar and David Silver},
The deep reinforcement learning community has made several independent improvements to the DQN algorithm. [] Key Result We also provide results from a detailed ablation study that shows the contribution of each component to overall performance.

Figures and Tables from this paper

To Combine or Not To Combine? A Rainbow Deep Reinforcement Learning Agent for Dialog Policies

In this paper, we explore state-of-the-art deep reinforcement learning methods for dialog policy training such as prioritized experience replay, double deep Q-Networks, dueling network architectures

Stabilizing Deep Reinforcement Learning with Conservative Updates

Experiments show that the proposed method reduces the variance of the process and improves the overall performance in off-policy actor-critic deep reinforcement learning regimes.

Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research

This work empirically revisit the paper which introduced the Rainbow algorithm and presents some new insights into the algorithms used by Rainbow, arguing that the traditional small-scale environments can still yield valuable scientific insights and can help reduce the barriers to entry for underprivileged communities.

Do recent advancements in model-based deep reinforcement learning really improve data efficiency?

It is demonstrated that the state-of-the-art model-free Rainbow DQN algorithm can be trained using a much smaller number of samples than it is commonly reported, at a fraction of complexity and computational costs.

Importance of using appropriate baselines for evaluation of data-efficiency in deep reinforcement learning for Atari

It is argued that the agent similar to the modified DQN that is presented in this paper should be used as a baseline for any future work aimed at improving sample efficiency of deep reinforcement learning.

Lifting the Veil on Hyper-parameters for Value-based Deep Reinforcement Learning

This study conducts an initial empirical investigation into a number of often-overlooked hyperparameters for value-based deep RL agents, demonstrating their varying levels of importance on a varied set of classic control environments.

Compound Asynchronous Exploration and Exploitation

This work proposes an asynchronous approach to deep reinforcement learning by combining exploration and exploitation by applying a framework to off-the-shelfDeep reinforcement learning algorithms, and experimental results show that the proposed algorithm is superior in final performance and efficiency.

Ensemble and Auxiliary Tasks for Data-Efficient Deep Reinforcement Learning

A refined bias-variance-covariance decomposition is derived to analyze the different ways of learning ensembles and using auxiliary tasks, and use the analysis to help provide some understanding of the case study on ATARI games under limited data constraint.


This work employs a novel combination of latent dynamics modelling and goal-reaching objectives, which exploit the inherent structure of data in reinforcement learning, and demonstrates that the method scales well with network capacity and pretraining data.

Accelerated Methods for Deep Reinforcement Learning

This work investigates how to optimize existing deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs, and confirms that both policy gradient and Q-value learning algorithms can be adapted to learn using many parallel simulator instances.



Deep Reinforcement Learning with Double Q-Learning

This paper proposes a specific adaptation to the DQN algorithm and shows that the resulting algorithm not only reduces the observed overestimations, as hypothesized, but that this also leads to much better performance on several games.

Playing Atari with Deep Reinforcement Learning

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.

Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening

We propose a novel training algorithm for reinforcement learning which combines the strength of deep Q-learning with a constrained optimization approach to tighten optimality and encourage faster

Dueling Network Architectures for Deep Reinforcement Learning

This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.

Massively Parallel Methods for Deep Reinforcement Learning

This work presents the first massively distributed architecture for deep reinforcement learning, using a distributed neural network to represent the value function or behaviour policy, and a distributed store of experience to implement the Deep Q-Network algorithm.

Reinforcement Learning with Unsupervised Auxiliary Tasks

This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.

Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

This paper considers the challenging Atari games domain, and proposes a new exploration method based on assigning exploration bonuses from a concurrently learned model of the system dynamics that provides the most consistent improvement across a range of games that pose a major challenge for prior methods.

Noisy Networks for Exploration

It is found that replacing the conventional exploration heuristics for A3C, DQN and dueling agents with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.

Asynchronous Methods for Deep Reinforcement Learning

A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

Deep Recurrent Q-Learning for Partially Observable MDPs

The effects of adding recurrency to a Deep Q-Network is investigated by replacing the first post-convolutional fully-connected layer with a recurrent LSTM, which successfully integrates information through time and replicates DQN's performance on standard Atari games and partially observed equivalents featuring flickering game screens.