• Corpus ID: 214728600

Importance of using appropriate baselines for evaluation of data-efficiency in deep reinforcement learning for Atari

@article{Kielak2020ImportanceOU,
  title={Importance of using appropriate baselines for evaluation of data-efficiency in deep reinforcement learning for Atari},
  author={Kacper Kielak},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.10181}
}
Reinforcement learning (RL) has seen great advancements in the past few years. Nevertheless, the consensus among the RL community is that currently used methods, despite all their benefits, suffer from extreme data inefficiency, especially in the rich visual domains like Atari. To circumvent this problem, novel approaches were introduced that often claim to be much more efficient than popular variations of the state-of-the-art DQN algorithm. In this paper, however, we demonstrate that the newly… 

Figures and Tables from this paper

Learning Representations for Pixel-based Control: What Matters and Why?

This paper presents a simple baseline approach that can learn meaningful representations with no metric-based learning, no data augmentations, no world-model learning, and no contrastive learning and hopes this view can motivate researchers to rethink representation learning when investigating how to best apply RL to real-world tasks.

Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Experimental results indicate that the proposed collection of strategies for performance optimization can improve results by 86% to 181% across the MuJuCo control benchmark, with TENGraD exhibiting the best approximation performance amongst the tested approximations.

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Nonparametric Approximation of Inter-Trace returns is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing.

Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning

This work provides the first provably efficient online RL algorithm that incorporates contrastive learning for representation learning, and theoretically proves that the algorithm recovers the true representations and simultaneously achieves sample efficiency in learning the optimal policy and Nash equilibrium in MDPs and MGs.

Understanding the Effects of Second-Order Approximations in Natural Policy Gradient Reinforcement Learning

Experimental results show that on average, improved second-order approximations achieve the best performance and that using properly tuned hyperparameters can lead to large improvements in performance and sample efficiency ranging up to +181%.

References

SHOWING 1-10 OF 23 REFERENCES

Rainbow: Combining Improvements in Deep Reinforcement Learning

This paper examines six extensions to the DQN algorithm and empirically studies their combination, showing that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance.

Model-Based Reinforcement Learning for Atari

Simulated Policy Learning (SimPLe), a complete model-based deep RL algorithm based on video prediction models, is described and a comparison of several model architectures is presented, including a novel architecture that yields the best results in the authors' setting.

A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games

This work extends a recently developed deep neural network for video frame prediction in Atari games to enable reward prediction as well, and phrases a joint optimization problem for minimizing both video frame and reward reconstruction loss, and adapt network parameters accordingly.

Sample-Efficient Deep RL with Generative Adversarial Tree Search

It is theoretically show that GATS improves the bias-variance trade-off in DRL, and significantly reduces the bias in Q estimates and leads to a drastic reduction of sample complexity of DQN by a factor of 200%.

Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

EBU achieves the same mean and median human normalized performance of DQN by using only 5% and 10% of samples, respectively.

Value Prediction Network

This paper proposes a novel deep reinforcement learning architecture, called Value Prediction Network (VPN), which integrates model-free and model-based RL methods into a single neural network, which outperforms Deep Q-Network on several Atari games even with short-lookahead planning.

Human-level control through deep reinforcement learning

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

Action-Conditional Video Prediction using Deep Networks in Atari Games

This paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs and proposes and evaluates two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks.

Imagination-Augmented Agents for Deep Reinforcement Learning

Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects, shows improved data efficiency, performance, and robustness to model misspecification compared to several baselines.

Dopamine: A Research Framework for Deep Reinforcement Learning

Dopamine is an open-source, TensorFlow-based, and compact and reliable implementations of some state-of-the-art deep RL agents that complement this offering with a taxonomy of the different research objectives in deep RL research.