• Publications
  • Influence
The Arcade Learning Environment: An Evaluation Platform for General Agents (Extended Abstract)
tl;dr
In this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. Expand
  • 1,284
  • 247
A Distributional Perspective on Reinforcement Learning
tl;dr
We argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. Expand
  • 390
  • 79
Unifying Count-Based Exploration and Intrinsic Motivation
tl;dr
We use density models to measure uncertainty in non-tabular reinforcement learning, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. Expand
  • 544
  • 78
Safe and Efficient Off-Policy Reinforcement Learning
tl;dr
In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expand
  • 288
  • 43
Distributional Reinforcement Learning with Quantile Regression
tl;dr
In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. Expand
  • 112
  • 36
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
tl;dr
We introduce a new version of the ALE that supports multiple game modes and provides a form of stochasticity we call sticky actions. Expand
  • 180
  • 31
The Cramer Distance as a Solution to Biased Wasserstein Gradients
tl;dr
We show that the Cramer distance possesses all three desired properties, combining the best of the Wasserstein and Kullback-Leibler divergences. Expand
  • 158
  • 24
Automated Curriculum Learning for Neural Networks
tl;dr
We introduce a method for automatically selecting the path, or syllabus, that a neural network follows through a curriculum so as to maximise learning efficiency. Expand
  • 188
  • 22
Count-Based Exploration with Neural Density Models
tl;dr
We use PixelCNN, an advanced neural density model for images, to supply a pseudo-count to generate an exploration bonus for a DQN agent and combined with a mixed Monte Carlo update was sufficient to achieve state of the art on Montezuma's Revenge. Expand
  • 157
  • 19
Increasing the Action Gap: New Operators for Reinforcement Learning
This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion ofExpand
  • 81
  • 16