- Published 1996 in Machine Learning

We introduce two new temporal diffence (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least-Squares TD (RLS TD). Although these new TD algorithms require more computation per time-step than do Sutton's TD(λ) algorithms, they are more efficient in a statistical sense because they extract more information from training experiences. We describe a simulation experiment showing the substantial improvement in learning rate achieved by RLS TD in an example Markov prediction problem. To quantify this improvement, we introduce theTD error variance of a Markov chain, ωTD, and experimentally conclude that the convergence rate of a TD algorithm depends linearly on ωTD. In addition to converging more rapidly, LS TD and RLS TD do not have control parameters, such as a learning rate parameter, thus eliminating the possibility of achieving poor performance by an unlucky choice of parameters.

Citations per Year

Semantic Scholar estimates that this publication has **511** citations based on the available data.

See our **FAQ** for additional information.

Showing 1-10 of 322 extracted citations

Highly Influenced

5 Excerpts

Highly Influenced

14 Excerpts

Highly Influenced

20 Excerpts

Highly Influenced

5 Excerpts

Highly Influenced

11 Excerpts

Highly Influenced

20 Excerpts

Highly Influenced

13 Excerpts

Highly Influenced

6 Excerpts

Highly Influenced

5 Excerpts

Highly Influenced

13 Excerpts

@article{Bradtke1996LinearLA,
title={Linear Least-Squares algorithms for temporal difference learning},
author={Steven J. Bradtke and Andrew G. Barto},
journal={Machine Learning},
year={1996},
volume={22},
pages={33-57}
}