Linear Least-Squares algorithms for temporal difference learning

Abstract

We introduce two new temporal diffence (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least-Squares TD (RLS TD). Although these new TD algorithms require more computation per time-step than do Sutton's TD(λ) algorithms, they are more efficient in a statistical sense because they extract more information from training experiences. We describe a simulation experiment showing the substantial improvement in learning rate achieved by RLS TD in an example Markov prediction problem. To quantify this improvement, we introduce theTD error variance of a Markov chain, ωTD, and experimentally conclude that the convergence rate of a TD algorithm depends linearly on ωTD. In addition to converging more rapidly, LS TD and RLS TD do not have control parameters, such as a learning rate parameter, thus eliminating the possibility of achieving poor performance by an unlucky choice of parameters.

DOI: 10.1007/BF00114723

Extracted Key Phrases

Showing 1-10 of 22 references

Finite Markov Chains

  • J G Kemeny, J L Snell
  • 1976
Highly Influential
3 Excerpts

Incremental Dynamic Programming for On-Line Adaptive Optimal Control

  • S J Bradtke
  • 1994
3 Excerpts

Received November

  • 1994
Showing 1-10 of 316 extracted citations
0204060'00'02'04'06'08'10'12'14'16
Citations per Year

558 Citations

Semantic Scholar estimates that this publication has received between 459 and 679 citations based on the available data.

See our FAQ for additional information.