Linear Least-Squares Algorithms for Temporal Difference Learning

  title={Linear Least-Squares Algorithms for Temporal Difference Learning},
  author={Steven J. Bradtke and Andrew G. Barto},
  journal={Machine Learning},
We introduce two new temporal difference (TD) algorithms based on the theory of linear least-squares function approximation. We define an algorithm we call Least-Squares TD (LS TD) for which we prove probability-one convergence when it is used with a function approximator linear in the adjustable parameters. We then define a recursive version of this algorithm, Recursive Least-Square TD (RLS TD). Although these new TD algorithms require more computation per time-step than do Sutton‘s TD… 

Least Squares SVM for Least Squares TD Learning

A QR decomposition based approach is introduced to solve the resulting generalized normal equations incrementally that is numerically more stable than existing recursive least squares based update algorithms and allows a forgetting factor in the updates to track non-stationary target functions.

An efficient L2-norm regularized least-squares temporal difference learning algorithm

Stochastic approximation for efficient LSTD and least squares regression

This paper considers a “big data” regime where both the dimension, d, of the data and the number, T, of training samples are large and proposes stochastic approximation based methods with randomization of samples in two different settings - one for policy evaluation using the least squares temporal difference (LSTD) algorithm and the other for solving the most squares problem.

An actor-critic method using Least Squares Temporal Difference learning

This paper uses a Least Squares Temporal Difference algorithm in an actor-critic framework where the actor and the critic operate concurrently, and proves the convergence of the process.

Model-Free Least-Squares Policy Iteration

A new approach to reinforcement learning which combines least squares function approximation with policy iteration, which is model-free and completely off policy and an off-policy method which can use (or reuse) data collected from any source.

An Adaptive Policy Evaluation Network Based on Recursive Least Squares Temporal Difference With Gradient Correction

A new adaptive policy evaluation network based on recursive least squares temporal difference with gradient correction (adaptive RC network) is proposed, which can adjust its network parameters in an adaptive way with a self-organizing approach according to the progress in learning.

An Empirical Study of Least-Squares Algorithms in Reinforcement Learning

The results show that with the formulation of LSTDQ(σ), there is little benefit to the additional flexibility, and a small change to the algorithm is proposed for future work.

Two-Timescale Networks for Nonlinear Value Function Approximation

This work provides a two-timescale network (TTN) architecture that enables linear methods to be used to learn values, with a nonlinear representation learned at a slower timescale, and proves convergence for TTNs.

12-009 Least-squares methods for policy iteration ∗

This chapter reviews leastsquares methods for policy iteration, an important class of algorithms for approximate reinforcement learning, and discusses three techniques for solving the core, policy evaluation component of policy iteration: least-squares temporal difference, least-Squares policy evaluation, and Bellman residual minimization.

Sparse Temporal Difference Learning via Alternating Direction Method of Multipliers

  • Nikos TsipinakisJ. Nelson
  • Computer Science
    2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)
  • 2015
This paper proposes a new algorithm for approximating the fixed-point based on the Alternating Direction Method of Multipliers (ADMM), and demonstrates, with experimental results, that the proposed algorithm is more stable for policy iteration compared to prior work.