• Publications
  • Influence
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
TLDR
We develop a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. Expand
  • 523
  • 107
  • PDF
A Distributional Perspective on Reinforcement Learning
TLDR
We argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. Expand
  • 464
  • 93
  • PDF
Unifying Count-Based Exploration and Intrinsic Motivation
TLDR
We use density models to measure uncertainty in non-tabular reinforcement learning, and propose a novel algorithm for deriving a pseudo-count from an arbitrary density model. Expand
  • 619
  • 85
  • PDF
Minimax Regret Bounds for Reinforcement Learning
TLDR
We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs. Expand
  • 223
  • 56
  • PDF
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
TLDR
This paper considers a variant of the basic algorithm for the stochastic multi-armed bandit problem that takes into account the empirical variance of the different arms. Expand
  • 409
  • 52
  • PDF
Finite-Time Bounds for Fitted Value Iteration
TLDR
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes. Expand
  • 272
  • 47
  • PDF
Safe and Efficient Off-Policy Reinforcement Learning
TLDR
In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expand
  • 308
  • 45
  • PDF
Distributional Reinforcement Learning with Quantile Regression
TLDR
In reinforcement learning an agent interacts with the environment by taking actions and observing the next state and reward. Expand
  • 146
  • 43
  • PDF
Sample Efficient Actor-Critic with Experience Replay
TLDR
We introduce an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including the discrete 57-game Atari domain and several continuous control problems. Expand
  • 377
  • 41
  • PDF
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
TLDR
We study a policy-iteration algorithm where the iterates are obtained via empirical risk minimization with a risk function that penalizes high magnitudes of the Bellman-residual. Expand
  • 253
  • 41
  • PDF