IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
- Lasse Espeholt, Hubert Soyer, K. Kavukcuoglu
- 5 February 2018
Computer Science
International Conference on Machine Learning
A new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) is developed that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation.
Progressive Neural Networks
- Andrei A. Rusu, Neil C. Rabinowitz, R. Hadsell
- 15 June 2016
Computer Science
arXiv.org
This work evaluates the progressive networks architecture extensively, and shows that it outperforms common baselines based on pretraining and finetuning and demonstrates that transfer occurs at both low-level sensory and high-level control layers of the learned policy.
Learning to reinforcement learn
- Jane X. Wang, Z. Kurth-Nelson, M. Botvinick
- 17 November 2016
Computer Science
Annual Meeting of the Cognitive Science Society
This work introduces a novel approach to deep meta-reinforcement learning, which is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure.
Learning to Navigate in Complex Environments
- Piotr Wojciech Mirowski, Razvan Pascanu, R. Hadsell
- 11 November 2016
Computer Science
International Conference on Learning…
This work considers jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks and shows that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs.
Vector-based navigation using grid-like representations in artificial agents
- Andrea Banino, C. Barry, D. Kumaran
- 9 May 2018
Computer Science, Biology
Nature
These findings show that emergent grid-like representations furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation, and support neuroscientific theories that see grid cells as critical for vector-based navigation.
Grounded Language Learning in a Simulated 3D World
- K. Hermann, Felix Hill, P. Blunsom
- 20 June 2017
Computer Science
arXiv.org
An agent is presented that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions and its comprehension of language extends beyond its prior experience, enabling it to apply familiar language to unfamiliar situations and to interpret entirely novel instructions.
Prefrontal cortex as a meta-reinforcement learning system
- Jane X. Wang, Z. Kurth-Nelson, M. Botvinick
- 13 April 2018
Biology, Psychology
bioRxiv
A new theory is presented showing how learning to learn may arise from interactions between prefrontal cortex and the dopamine system, providing a fresh foundation for future research.
Multi-task Deep Reinforcement Learning with PopArt
- Matteo Hessel, Hubert Soyer, Lasse Espeholt, Wojciech M. Czarnecki, Simon Schmitt, H. V. Hasselt
- 12 September 2018
Computer Science
AAAI Conference on Artificial Intelligence
This work proposes to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics, and learns a single trained policy that exceeds median human performance on this multi-task domain.
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control
- H. F. Song, A. Abdolmaleki, M. Botvinick
- 26 September 2019
Computer Science
International Conference on Learning…
V-MPO is introduced, an on-policy adaptation of Maximum a Posteriori Policy Optimization that performs policy iteration based on a learned state-value function and does so reliably without importance weighting, entropy regularization, or population-based tuning of hyperparameters.
Making Efficient Use of Demonstrations to Solve Hard Exploration Problems
- T. Paine, Çaglar Gülçehre, Worlds Team
- 3 September 2019
Computer Science
International Conference on Learning…
R2D3 is introduced, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions and a suite of eight tasks that combine these three properties.
...
...