Corpus ID: 16326763

Continuous control with deep reinforcement learning

@article{Lillicrap2016ContinuousCW,
  title={Continuous control with deep reinforcement learning},
  author={Timothy P. Lillicrap and Jonathan J. Hunt and Alexander Pritzel and Nicolas Manfred Otto Heess and Tom Erez and Yuval Tassa and David Silver and Daan Wierstra},
  journal={CoRR},
  year={2016},
  volume={abs/1509.02971}
}
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. [...] Key Method Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.Expand
Continuous Deep Q-Learning with Model-based Acceleration
TLDR
This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks. Expand
The Beta Policy for Continuous Control Reinforcement Learning
Recently, reinforcement learning with deep neural networks has achieved great success in challenging continuous control problems such as 3D locomotion and robotic manipulation. However, in real-worldExpand
DEEP REINFORCEMENT LEARNING IN PARAMETER- IZED ACTION SPACE
TLDR
This paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs, which features a small set of discrete action types, each of which is parameterized with continuous variables. Expand
Particle-Based Adaptive Discretization for Continuous Control using Deep Reinforcement Learning
TLDR
This paper proposes a general, yet simple, framework for improving the action exploration of policy gradient DRL algorithms that adapts ideas from the particle filtering literature to dynamically discretize the continuous action space and track policies represented as a mixture of Gaussians. Expand
Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution
TLDR
It is shown that the Beta policy is bias-free and provides significantly faster convergence and higher scores over the Gaussian policy when both are used with trust region policy optimization and actor critic with experience replay, the state-of-the-art on- and off-policy stochastic methods respectively, on OpenAI Gym's and MuJoCo's continuous control environments. Expand
Continuous Control in Deep Reinforcement Learning with Direct Policy Derivation from Q Network
TLDR
This work implements efficient action derivation method which allows using Q-learning in real-time continuous control tasks and shows that in some cases proposed approach learns smooth continuous policy keeping the implementation simplicity of the original discreet action space Q- learning algorithm. Expand
Using Deep Reinforcement Learning for the Continuous Control of Robotic Arms
TLDR
A newly created combination of two commonly used reinforcement learning methods is tested to see whether it is able to learn more effectively than a baseline and to reduce training time and eventually help the algorithm to converge. Expand
Deep Reinforcement Learning in Parameterized Action Space
TLDR
This paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs within the domain of simulated RoboCup soccer, which features a small set of discrete action types each of which is parameterized with continuous variables. Expand
Deep Reinforcement Learning for Simulated Autonomous Vehicle Control
We investigate the use of Deep Q-Learning to control a simulated car via reinforcement learning. We start by implementing the approach of [5] ourselves, and then experimenting with various possibleExpand
Multi-Pass Q-Networks for Deep Reinforcement Learning with Parameterised Action Spaces
TLDR
It is empirically demonstrated that MP-DQN significantly outperforms P-D QN and other previous algorithms in terms of data efficiency and converged policy performance on the Platform, Robot Soccer Goal, and Half Field Offense domains. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 39 REFERENCES
From Pixels to Torques: Policy Learning with Deep Dynamical Models
TLDR
This paper introduces a data-efficient, model-based reinforcement learning algorithm that learns a closed-loop control policy from pixel information only, and facilitates fully autonomous learning from pixels to torques. Expand
Autonomous reinforcement learning with experience replay.
  • P. Wawrzynski, A. Tanwani
  • Computer Science, Medicine
  • Neural networks : the official journal of the International Neural Network Society
  • 2013
TLDR
A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. Expand
Playing Atari with Deep Reinforcement Learning
TLDR
This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them. Expand
End-to-End Training of Deep Visuomotor Policies
TLDR
This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method. Expand
Memory-based control with recurrent neural networks
TLDR
This work extends two related, model-free algorithms for continuous control to solve partially observed domains using recurrent neural networks trained with backpropagation through time to find that recurrent deterministic and stochastic policies are able to learn similarly good solutions to these tasks, including the water maze where the agent must learn effective search strategies. Expand
Learning Continuous Control Policies by Stochastic Value Gradients
TLDR
A unified framework for learning continuous control policies using backpropagation supported by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise is presented. Expand
Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies
TLDR
GProp, a deep reinforcement learning algorithm for continuous policies with compatible function approximation based on a temporal-difference based method for learning the gradient of the value-function, is proposed and achieves the best performance to date on the octopus arm. Expand
Real-time reinforcement learning by sequential Actor-Critics and experience replay
TLDR
It is formally shown that the resulting estimation bias is bounded and asymptotically vanishes, which allows the experience replay-augmented algorithm to preserve the convergence properties of the original algorithm. Expand
Online Evolution of Deep Convolutional Network for Vision-Based Reinforcement Learning
TLDR
The Max-Pooling Convolutional Neural Network (MPCNN) compressor is evolved online, maximizing the distances between normalized feature vectors computed from the images collected by the recurrent neural network (RNN) controllers during their evaluation in the environment. Expand
Human-level control through deep reinforcement learning
TLDR
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. Expand
...
1
2
3
4
...