• Publications
  • Influence
Policy Gradient Methods for Reinforcement Learning with Function Approximation
TLDR
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy. Expand
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
TLDR
It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning. Expand
Introduction
TLDR
Findings from pilot studies and literature reviews are highlighted that might help the clinician and patient better differentiate between anti-VEGF drugs. Expand
Learning to Act Using Real-Time Dynamic Programming
TLDR
An algorithm based on dynamic programming, which is called Real-Time DP, is introduced, by which an embedded system can improve its performance with experience and illuminate aspects of other DP-based reinforcement learning methods such as Watkins'' Q-Learning algorithm. Expand
Near-Optimal Reinforcement Learning in Polynomial Time
TLDR
New algorithms for reinforcement learning are presented and it is proved that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes. Expand
Eligibility Traces for Off-Policy Policy Evaluation
TLDR
This paper considers the off-policy version of the policy evaluation problem, for which only one eligibility trace algorithm is known, a Monte Carlo method, and analyzes and compares this and four new eligibility trace algorithms, emphasizing their relationships to the classical statistical technique known as importance sampling. Expand
Action-Conditional Video Prediction using Deep Networks in Atari Games
TLDR
This paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs and proposes and evaluates two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks. Expand
Graphical Models for Game Theory
TLDR
The main result is a provably correct and efficient algorithm for computing approximate Nash equilibria in one-stage games represented by trees or sparse graphs. Expand
Predictive Representations of State
TLDR
This is the first specific formulation of the predictive idea that includes both stochasticity and actions (controls) and it is shown that any system has a linear predictive state representation with number of predictions no greater than the number of states in its minimal POMDP model. Expand
Intrinsically Motivated Reinforcement Learning
TLDR
Initial results from a computational study of intrinsically motivated reinforcement learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy are presented. Expand
...
1
2
3
4
5
...