• Corpus ID: 6875312

Asynchronous Methods for Deep Reinforcement Learning

  title={Asynchronous Methods for Deep Reinforcement Learning},
  author={Volodymyr Mnih and Adri{\`a} Puigdom{\`e}nech Badia and Mehdi Mirza and Alex Graves and Timothy P. Lillicrap and Tim Harley and David Silver and Koray Kavukcuoglu},
We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. [] Key Method The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control…

Figures and Tables from this paper

Accelerated Methods for Deep Reinforcement Learning
This work investigates how to optimize existing deep RL algorithms for modern computers, specifically for a combination of CPUs and GPUs, and confirms that both policy gradient and Q-value learning algorithms can be adapted to learn using many parallel simulator instances.
Efficient Parallel Methods for Deep Reinforcement Learning
A novel framework for efficient parallelization of deep reinforcement learning algorithms, enabling these algorithms to learn from multiple actors on a single machine, and can be efficiently implemented on a GPU, allowing the usage of powerful models while significantly reducing training time.
A New Asynchronous Architecture for Tabular Reinforcement Learning Algorithms
One of these algorithms, which is called Asynchronous Dyna-Q, surpasses existing asynchronous reinforcement learning algorithms, can well balance the exploration and exploitation and can solve discrete space path planning problems efficiently.
Training a deep policy gradient-based neural network with asynchronous learners on a simulated robotic problem
This paper investigates in how far a recent asynchronously parallel actor-critic approach, initially proposed to speed up discrete RL algorithms, could be used for the continuous control of robotic arms.
Deep Reinforcement Learning for Doom using Unsupervised Auxiliary Tasks
A divide and conquer deep reinforcement learning solution for solving more complex environments, in which the reward is sparse and the state space is huge is proposed and an agent is tested in the first person shooter game of Doom.
Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates
It is demonstrated that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots.
A Brief Survey of Deep Reinforcement Learning
This survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning.
Agent Modeling as Auxiliary Task for Deep Reinforcement Learning
The results show that the proposed architectures stabilize learning and outperform the standard A3C architecture when learning a best response in terms of expected rewards.
Multi-Task Deep Reinforcement Learning for Continuous Action Control
In this paper, we propose a deep reinforcement learning algorithm to learn multiple tasks concurrently. A new network architecture is proposed in the algorithm which reduces the number of parameters
Deep Reinforcement Learning With Macro-Actions
This paper focuses on macro-actions, and evaluates these on different Atari 2600 games, where they yield significant improvements in learning speed and can even achieve better scores than DQN.


Continuous control with deep reinforcement learning
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Distributed Deep Q-Learning
We propose a distributed deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is based on the deep
Massively Parallel Methods for Deep Reinforcement Learning
This work presents the first massively distributed architecture for deep reinforcement learning, using a distributed neural network to represent the value function or behaviour policy, and a distributed store of experience to implement the Deep Q-Network algorithm.
Playing Atari with Deep Reinforcement Learning
This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Dueling Network Architectures for Deep Reinforcement Learning
This paper presents a new neural network architecture for model-free reinforcement learning that leads to better policy evaluation in the presence of many similar-valued actions and enables the RL agent to outperform the state-of-the-art on the Atari 2600 domain.
Model-Free reinforcement learning with continuous action in practice
The actor-critic algorithm is applied to learn on a robotic platform with a fast sensorimotor cycle and constitutes an important step towards practical real-time learning control with continuous action.
Human-level control through deep reinforcement learning
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Reinforcement learning for robots using neural networks
This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.
Evolving deep unsupervised convolutional networks for vision-based reinforcement learning
Both the MPCNN preprocessor and the RNN controller are evolved successfully to control a car in the TORCS racing simulator using only visual input, the first use of deep learning in the context evolutionary RL.
Parallel reinforcement learning with linear function approximation
In this paper, we investigate the use of parallelization in reinforcement learning (RL), with the goal of learning optimal policies for single-agent RL problems more quickly by using parallel