• Publications
  • Influence
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
TLDR
We develop a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. Expand
  • 590
  • 118
  • PDF
Progressive Neural Networks
TLDR
We introduce progressive networks, a novel model architecture with explicit support for transfer across sequences of tasks, and show that it outperforms common baselines based on pretraining and finetuning. Expand
  • 563
  • 76
  • PDF
Learning to reinforcement learn
TLDR
In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. Expand
  • 478
  • 60
  • PDF
Learning to Navigate in Complex Environments
TLDR
We proposed a deep RL method, augmented with memory and auxiliary learning targets, for training agents to navigate within large and visually rich environments that include frequently changing start and goal locations. Expand
  • 535
  • 33
  • PDF
Grounded Language Learning in a Simulated 3D World
TLDR
We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. Expand
  • 173
  • 21
  • PDF
Vector-based navigation using grid-like representations in artificial agents
TLDR
Grid-like representations emerge spontaneously within a neural network trained to self-localize, enabling the agent to take shortcuts to destinations using vector-based navigation. Expand
  • 267
  • 18
  • PDF
Prefrontal cortex as a meta-reinforcement learning system
Over the past 20 years, neuroscience research on reward-based learning has converged on a canonical model, under which the neurotransmitter dopamine ‘stamps in’ associations between situations,Expand
  • 211
  • 14
  • PDF
Multi-task Deep Reinforcement Learning with PopArt
TLDR
The reinforcement learning (RL) community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. Expand
  • 105
  • 10
  • PDF
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
TLDR
We introduce V-MPO, an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO) that performs policy iteration based on a learned state-value function. Expand
  • 30
  • 7
  • PDF
Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
TLDR
We introduce R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. Expand
  • 21
  • 3
  • PDF
...
1
2
3
...