• Publications
  • Influence
Deterministic Policy Gradient Algorithms
TLDR
This paper introduces an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy and demonstrates that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces. Expand
Off-Policy Actor-Critic
TLDR
This paper derives an incremental, linear time and space complexity algorithm that includes eligibility traces, proves convergence under assumptions similar to previous off-policy algorithms, and empirically show better or comparable performance to existing algorithms on standard reinforcement-learning benchmark problems. Expand
Deep Reinforcement Learning in Large Discrete Action Spaces
TLDR
This paper leverages prior information about the actions to embed them in a continuous space upon which it can generalize, and uses approximate nearest-neighbor methods to allow reinforcement learning methods to be applied to large-scale learning problems previously intractable with current methods. Expand
Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
TLDR
Results using Horde on a multi-sensored mobile robot to successfully learn goal-oriented behaviors and long-term predictions from off-policy experience are presented. Expand
Vector-based navigation using grid-like representations in artificial agents
TLDR
These findings show that emergent grid-like representations furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation, and support neuroscientific theories that see grid cells as critical for vector-based navigation. Expand
The Predictron: End-To-End Learning and Planning
TLDR
The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple "imagined" planning steps that accumulates internal rewards and values over multiple planning depths. Expand
Model-Free reinforcement learning with continuous action in practice
TLDR
The actor-critic algorithm is applied to learn on a robotic platform with a fast sensorimotor cycle and constitutes an important step towards practical real-time learning control with continuous action. Expand
Learning the structure of Factored Markov Decision Processes in reinforcement learning problems
TLDR
SPITI is described, an instantiation of SDYNA, that uses incremental decision tree induction to learn the structure of a problem combined with an incremental version of the Structured Value Iteration algorithm that can build a factored representation of a reinforcement learning problem and may improve the policy faster than tabular reinforcement learning algorithms. Expand
Tuning-free step-size adaptation
TLDR
This paper introduces a series of modifications and normalizations to the IDBD method that together eliminate the need to tune the meta-step-size parameter to the particular problem, and shows that the resulting overall algorithm, called Autostep, performs as well or better than the existing step-size adaptation methods on a number of idealized and robot prediction problems and does not require any tuning of its meta- stepped size parameter. Expand
Linear Off-Policy Actor-Critic
TLDR
This paper derives an incremental, linear time and space complexity algorithm that includes eligibility traces, proves convergence under assumptions similar to previous o↵-policy algorithms, and empirically show better or comparable performance to existing algorithms on standard reinforcement-learning benchmark problems. Expand
...
1
2
3
4
...