Human-level control through deep reinforcement learning
- Volodymyr Mnih, K. Kavukcuoglu, D. Hassabis
- Computer ScienceNature
- 26 February 2015
This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Playing Atari with Deep Reinforcement Learning
- Volodymyr Mnih, K. Kavukcuoglu, Martin A. Riedmiller
- Computer ScienceArXiv
- 19 December 2013
This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
Deterministic Policy Gradient Algorithms
- David Silver, Guy Lever, N. Heess, T. Degris, Daan Wierstra, Martin A. Riedmiller
- Computer ScienceInternational Conference on Machine Learning
- 21 June 2014
This paper introduces an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy and demonstrates that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.
Striving for Simplicity: The All Convolutional Net
- J. T. Springenberg, A. Dosovitskiy, T. Brox, Martin A. Riedmiller
- Computer ScienceInternational Conference on Learning…
- 21 December 2014
It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks.
A direct adaptive method for faster backpropagation learning: the RPROP algorithm
- Martin A. Riedmiller, H. Braun
- Computer ScienceIEEE International Conference on Neural Networks
- 28 March 1993
A learning algorithm for multilayer feedforward networks, RPROP (resilient propagation), is proposed that performs a local adaptation of the weight-updates according to the behavior of the error function to overcome the inherent disadvantages of pure gradient-descent.
Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method
- Martin A. Riedmiller
- Computer ScienceEuropean Conference on Machine Learning
- 3 October 2005
NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron, is introduced and it is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.
Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
- Manuel Watter, J. T. Springenberg, J. Boedecker, Martin A. Riedmiller
- Computer Science, MathematicsNIPS
- 24 June 2015
Embed to Control is introduced, a method for model learning and control of non-linear dynamical systems from raw pixel images that is derived directly from an optimal control formulation in latent space and exhibits strong performance on a variety of complex control problems.
Maximum a Posteriori Policy Optimisation
- A. Abdolmaleki, J. T. Springenberg, Yuval Tassa, R. Munos, N. Heess, Martin A. Riedmiller
- Computer ScienceInternational Conference on Learning…
- 15 February 2018
This work introduces a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective and develops two off-policy algorithms that are competitive with the state-of-the-art in deep reinforcement learning.
Emergence of Locomotion Behaviours in Rich Environments
- N. Heess, TB Dhruva, David Silver
- Computer ScienceArXiv
- 7 July 2017
This paper explores how a rich environment can help to promote the learning of complex behavior, and finds that this encourages the emergence of robust behaviours that perform well across a suite of tasks.
Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks
- A. Dosovitskiy, P. Fischer, J. T. Springenberg, Martin A. Riedmiller, T. Brox
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine…
- 26 June 2014
While features learned with this approach cannot compete with class specific features from supervised training on a classification task, it is shown that they are advantageous on geometric matching problems, where they also outperform the SIFT descriptor.
...
...