# Linear Least-Squares Algorithms for Temporal Difference Learning

@article{Bradtke1996LinearLA, title={Linear Least-Squares Algorithms for Temporal Difference Learning}, author={S. Bradtke and A. Barto}, journal={Machine Learning}, year={1996}, volume={22}, pages={33-57} }

We introduce two new temporal difference (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least-Square TD (RLS TD). Although these new TD algorithms require more computation per time-step than do Sutton‘s TD… Expand

#### Topics from this paper

#### 249 Citations

Fast gradient-descent methods for temporal-difference learning with linear function approximation

- Mathematics, Computer Science
- ICML '09
- 2009

Two new related algorithms with better convergence rates are introduced: the first algorithm, GTD2, is derived and proved convergent just as GTD was, but uses a different objective function and converges significantly faster (but still not as fast as conventional TD). Expand

Least Squares SVM for Least Squares TD Learning

- Computer Science
- ECAI
- 2006

A QR decomposition based approach is introduced to solve the resulting generalized normal equations incrementally that is numerically more stable than existing recursive least squares based update algorithms and allows a forgetting factor in the updates to track non-stationary target functions. Expand

A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation

- Mathematics
- NIPS 2008
- 2008

We introduce the first temporal-difference learning algorithm that is stable with linear function approximation and off-policy training, for any finite Markov decision process, behavior policy, and… Expand

Adaptive Lambda Least-Squares Temporal Difference Learning

- Mathematics, Computer Science
- ArXiv
- 2016

The $\lambda$ selection problem is formalized as a bias-variance trade-off where the solution is the value of $\ lambda$ that leads to the smallest Mean Squared Value Error (MSVE). Expand

$\ell_1$ Regularized Gradient Temporal-Difference Learning

- Computer Science
- 2016

This work proposes a family of $\ell_1$ regularized GTD algorithms, which employ the well known soft thresholding operator, and investigates convergence properties of the proposed algorithms, and depicts their performance with several numerical experiments. Expand

An efficient L2-norm regularized least-squares temporal difference learning algorithm

- Computer Science
- Knowl. Based Syst.
- 2013

An efficient recursive least-squares algorithm is proposed for L2-norm regularized LSTD learning and it can eliminate matrix inversion operations and decrease computational complexity effectively. Expand

Regularization and feature selection in least-squares temporal difference learning

- Mathematics, Computer Science
- ICML '09
- 2009

This paper proposes a regularization framework for the LSTD algorithm, which is robust to irrelevant features and also serves as a method for feature selection, and presents an algorithm similar to the Least Angle Regression algorithm that can efficiently compute the optimal solution. Expand

Stochastic approximation for efficient LSTD and least squares regression

- Mathematics
- 2014

We propose stochastic approximation based methods with randomization of samples in two different settings - one for policy evaluation using the least squares temporal difference (LSTD) algorithm and… Expand

Statistically linearized least-squares temporal differences

- Mathematics, Computer Science
- International Congress on Ultra Modern Telecommunications and Control Systems
- 2010

This paper lifts the restriction of LSTD thanks to a derivative-free statistical linearization approach, which means nonlinear parameterizations and the Bellman optimality operator can be taken into account and the efficiency of the resulting algorithms are demonstrated. Expand

An actor-critic method using Least Squares Temporal Difference learning

- Computer Science
- Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference
- 2009

This paper uses a Least Squares Temporal Difference algorithm in an actor-critic framework where the actor and the critic operate concurrently, and proves the convergence of the process. Expand