Linear Least-Squares Algorithms for Temporal Difference Learning

@article{Bradtke1996LinearLA,
  title={Linear Least-Squares Algorithms for Temporal Difference Learning},
  author={S. Bradtke and A. Barto},
  journal={Machine Learning},
  year={1996},
  volume={22},
  pages={33-57}
}
We introduce two new temporal difference (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least-Square TD (RLS TD). Although these new TD algorithms require more computation per time-step than do Sutton‘s TD… Expand
Fast gradient-descent methods for temporal-difference learning with linear function approximation
TLDR
Two new related algorithms with better convergence rates are introduced: the first algorithm, GTD2, is derived and proved convergent just as GTD was, but uses a different objective function and converges significantly faster (but still not as fast as conventional TD). Expand
Least Squares SVM for Least Squares TD Learning
TLDR
A QR decomposition based approach is introduced to solve the resulting generalized normal equations incrementally that is numerically more stable than existing recursive least squares based update algorithms and allows a forgetting factor in the updates to track non-stationary target functions. Expand
A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation
We introduce the first temporal-difference learning algorithm that is stable with linear function approximation and off-policy training, for any finite Markov decision process, behavior policy, andExpand
Adaptive Lambda Least-Squares Temporal Difference Learning
TLDR
The $\lambda$ selection problem is formalized as a bias-variance trade-off where the solution is the value of $\ lambda$ that leads to the smallest Mean Squared Value Error (MSVE). Expand
$\ell_1$ Regularized Gradient Temporal-Difference Learning
TLDR
This work proposes a family of $\ell_1$ regularized GTD algorithms, which employ the well known soft thresholding operator, and investigates convergence properties of the proposed algorithms, and depicts their performance with several numerical experiments. Expand
An efficient L2-norm regularized least-squares temporal difference learning algorithm
TLDR
An efficient recursive least-squares algorithm is proposed for L2-norm regularized LSTD learning and it can eliminate matrix inversion operations and decrease computational complexity effectively. Expand
Regularization and feature selection in least-squares temporal difference learning
TLDR
This paper proposes a regularization framework for the LSTD algorithm, which is robust to irrelevant features and also serves as a method for feature selection, and presents an algorithm similar to the Least Angle Regression algorithm that can efficiently compute the optimal solution. Expand
Stochastic approximation for efficient LSTD and least squares regression
We propose stochastic approximation based methods with randomization of samples in two different settings - one for policy evaluation using the least squares temporal difference (LSTD) algorithm andExpand
Statistically linearized least-squares temporal differences
  • M. Geist, O. Pietquin
  • Mathematics, Computer Science
  • International Congress on Ultra Modern Telecommunications and Control Systems
  • 2010
TLDR
This paper lifts the restriction of LSTD thanks to a derivative-free statistical linearization approach, which means nonlinear parameterizations and the Bellman optimality operator can be taken into account and the efficiency of the resulting algorithms are demonstrated. Expand
An actor-critic method using Least Squares Temporal Difference learning
TLDR
This paper uses a Least Squares Temporal Difference algorithm in an actor-critic framework where the actor and the critic operate concurrently, and proves the convergence of the process. Expand
...
1
2
3
4
5
...