Share This Author
Reinforcement Learning: An Introduction
This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Policy Gradient Methods for Reinforcement Learning with Function Approximation
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
Introduction to Reinforcement Learning
In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
Learning to Predict by the Methods of Temporal Differences
- R. Sutton
- PsychologyMachine Learning
- 1 August 1988
This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.
Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming
- R. Sutton
- Computer ScienceML
- 1 June 1990
Neuronlike adaptive elements that can solve difficult learning control problems
- A. Barto, R. Sutton, C. Anderson
- Computer ScienceIEEE Transactions on Systems, Man, and…
- 1 September 1983
It is shown how a system consisting of two neuronlike adaptive elements can solve a difficult learning control problem and the relation of this work to classical and instrumental conditioning in animal learning studies and its possible implications for research in the neurosciences.
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
- R. Sutton
- Computer ScienceNIPS
- 27 November 1995
It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.
Fast gradient-descent methods for temporal-difference learning with linear function approximation
Two new related algorithms with better convergence rates are introduced: the first algorithm, GTD2, is derived and proved convergent just as GTD was, but uses a different objective function and converges significantly faster (but still not as fast as conventional TD).
Eligibility Traces for Off-Policy Policy Evaluation
This paper considers the off-policy version of the policy evaluation problem, for which only one eligibility trace algorithm is known, a Monte Carlo method, and analyzes and compares this and four new eligibility trace algorithms, emphasizing their relationships to the classical statistical technique known as importance sampling.