• Publications
  • Influence
Continuous control with deep reinforcement learning
TLDR
This work presents an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces, and demonstrates that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand
Deterministic Policy Gradient Algorithms
TLDR
This paper introduces an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy and demonstrates that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces. Expand
Recurrent Models of Visual Attention
TLDR
A novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution is presented. Expand
Relational inductive biases, deep learning, and graph networks
TLDR
It is argued that combinatorial generalization must be a top priority for AI to achieve human-like abilities, and that structured representations and computations are key to realizing this objective. Expand
Sample Efficient Actor-Critic with Experience Replay
This paper presents an actor-critic deep reinforcement learning agent with experience replay that is stable, sample efficient, and performs remarkably well on challenging environments, including theExpand
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural networkExpand
Emergence of Locomotion Behaviours in Rich Environments
TLDR
This paper explores how a rich environment can help to promote the learning of complex behavior, and finds that this encourages the emergence of robust behaviours that perform well across a suite of tasks. Expand
Learning Continuous Control Policies by Stochastic Value Gradients
TLDR
A unified framework for learning continuous control policies using backpropagation supported by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise is presented. Expand
Distributed Distributional Deterministic Policy Gradients
TLDR
The results show that across a wide variety of simple control tasks, difficult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance. Expand
Maximum a Posteriori Policy Optimisation
TLDR
This work introduces a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective and develops two off-policy algorithms that are competitive with the state-of-the-art in deep reinforcement learning. Expand
...
1
2
3
4
5
...