• Corpus ID: 231741085

On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning

@inproceedings{Durmus2021OnTS,
  title={On the Stability of Random Matrix Product with Markovian Noise: Application to Linear Stochastic Approximation and TD Learning},
  author={Alain Durmus and {\'E}ric Moulines and Alexey Naumov and Sergey Samsonov and Hoi-To Wai},
  booktitle={COLT},
  year={2021}
}
This paper studies the exponential stability of random matrix products driven by a general (possibly unbounded) state space Markov chain. It is a cornerstone in the analysis of stochastic algorithms in machine learning (e.g. for parameter tracking in online-learning or reinforcement learning). The existing results impose strong conditions such as uniform boundedness of the matrix-valued functions and uniform ergodicity of the Markov chains. Our main contribution is an exponential stability… 

Finite-time High-probability Bounds for Polyak-Ruppert Averaged Iterates of Linear Stochastic Approximation

TLDR
A finite-time analysis of linear stochastic approximation algorithms with step size with tight dependence on the parameters in the higher order terms is provided, and instance-dependent bounds on the Polyak-Ruppert averaged sequence of iterates are proved.

Tight High Probability Bounds for Linear Stochastic Approximation with Fixed Stepsize

TLDR
A non-asymptotic analysis of linear stochastic approximation (LSA) algorithms with fixed stepsize based on new results regarding moments and high probability bounds for products of matrices which are shown to be tight.

Projection-free Constrained Stochastic Nonconvex Optimization with State-dependent Markov Data

TLDR
A projection-free conditional gradient-type algorithm for constrained nonconvex stochastic optimization problems with Markovian data focused on the case when the transition kernel of the Markov chain is state-dependent.

A Single-Timescale Analysis For Stochastic Approximation With Multiple Coupled Sequences

TLDR
The merit of the results lies in that applying them to stochastic bilevel and compositional optimization problems, as well as RL problems leads to either relaxed assumptions or improvements over their existing performance guarantees.

Accelerated and instance-optimal policy evaluation with linear function approximation

TLDR
An accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality is developed.

Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning

TLDR
The online bootstrap method is shown to be distributionally consistent for statistical inference in policy evaluation, and numerical experiments are included to demonstrate the effectiveness of this algorithm across a range of real RL environments.

The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning

TLDR
A functional CLT is established, as well as the usual one-dimensional CLT for the normalized error zn := (θn − θ∗)/ √ αn, and Moment bounds combined with the CLT imply convergence of the normalized covariance, lim n→∞ E[znz T n] = Σθ, where θ is the asymptotic covariance appearing in theCLT.

References

SHOWING 1-10 OF 28 REFERENCES

Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning

We consider the dynamics of a linear stochastic approximation algorithm driven by Markovian noise, and derive finite-time bounds on the moments of the error, i.e., deviation of the output of the

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

TLDR
The bounds show that there is no discrepancy in the convergence rate between Markovian and martingale noise, only the constants are affected by the mixing time of the Markov chain.

Large Deviations Asymptotics and the Spectral Theory of Multiplicatively Regular Markov Processes

In this paper we continue the investigation of the spectral theory and exponential asymptotics of primarily discrete-time Markov processes, following Kontoyiannis and Meyn (2003). We introduce a new

Explicit Mean-Square Error Bounds for Monte-Carlo and Linear Stochastic Approximation

TLDR
It is shown that mean square error achieves the optimal rate of $O(1/n)$, subject to conditions on the step-size sequence, which is of great value in algorithm design.

Spectral theory and limit theorems for geometrically ergodic Markov processes

f ,w hereP is the transition kernel of the Markov chain and α ∈ C is a constant. The function ˇ f is an eigenfunction, with corresponding eigenvalue λ, for the kernel (e αF P) = e αF (x) P( x, dy). A

Finite-Time Analysis and Restarting Scheme for Linear Two-Time-Scale Stochastic Approximation

  • Thinh T. Doan
  • Computer Science, Mathematics
    SIAM J. Control. Optim.
  • 2021
TLDR
The main focus is to characterize the finite-time complexity of this linear two-time-scale stochastic approximation under time-varying step sizes and Markovian noise, and it is shown that the mean square errors of the variables generated by the method converge to zero at a sublinear rate.

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

TLDR
Finite time convergence rates for TD learning with linear function approximation are proved and the authors provide results for the case when TD is applied to a single Markovian data stream where the algorithm's updates can be severely biased.

Stability of recursive stochastic tracking algorithms

  • Lei Guo
  • Mathematics, Computer Science
    Proceedings of 32nd IEEE Conference on Decision and Control
  • 1993
TLDR
It is shown that for a quite general class of random matrices {A/sub n/} of interest, the stability of such a vector equation can be guaranteed by that of a corresponding scalar linear equation, for which various results are given without requiring stationary or mixing conditions.

Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning

TLDR
This work designs an adaptive learning rate scheme which significantly improves the convergence rate over the known optimal polynomial decay rule, and can be used to potentially improve the performance of any other schedule where the learning rate is changed at pre-determined time instants.

Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go?

TLDR
This paper describes the instance dependent behavior of the error of the said algorithms, identifies some conditions under which the answer to the above questions can be changed to the positive, and shows instance-dependent error bounds of magnitude O(1/t) for the constant stepsize iterate averaged versions of TD(0) and a novel variant of GTD, where the stepsize is chosen independently of the value estimation instance.