Share This Author
Policy Gradient Methods for Reinforcement Learning with Function Approximation
This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
Learning to Act Using Real-Time Dynamic Programming
Near-Optimal Reinforcement Learning in Polynomial Time
New algorithms for reinforcement learning are presented and it is proved that they have polynomial bounds on the resources required to achieve near-optimal return in general Markov decision processes.
Eligibility Traces for Off-Policy Policy Evaluation
- Doina Precup, R. Sutton, Satinder Singh
- Computer ScienceInternational Conference on Machine Learning
- 29 June 2000
This paper considers the off-policy version of the policy evaluation problem, for which only one eligibility trace algorithm is known, a Monte Carlo method, and analyzes and compares this and four new eligibility trace algorithms, emphasizing their relationships to the classical statistical technique known as importance sampling.
Action-Conditional Video Prediction using Deep Networks in Atari Games
- Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L. Lewis, Satinder Singh
- Computer ScienceNIPS
- 31 July 2015
This paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs and proposes and evaluates two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks.
Predictive Representations of State
This is the first specific formulation of the predictive idea that includes both stochasticity and actions (controls) and it is shown that any system has a linear predictive state representation with number of predictions no greater than the number of states in its minimal POMDP model.
Graphical Models for Game Theory
- M. Kearns, M. Littman, Satinder Singh
- Computer ScienceConference on Uncertainty in Artificial…
- 2 August 2001
The main result is a provably correct and efficient algorithm for computing approximate Nash equilibria in one-stage games represented by trees or sparse graphs.
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms
A rigorous proof of convergence of DP-based learning algorithms is provided by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem, which establishes a general class of convergent algorithms to which both TD() and Q-learning belong.
Intrinsically Motivated Reinforcement Learning
Initial results from a computational study of intrinsically motivated reinforcement learning aimed at allowing artificial agents to construct and extend hierarchies of reusable skills that are needed for competent autonomy are presented.