# A Discrete-Time Switching System Analysis of Q-learning

@inproceedings{Lee2021ADS, title={A Discrete-Time Switching System Analysis of Q-learning}, author={Donghwan Lee and Jianghai Hu and Niao He}, year={2021} }

This paper develops a novel control-theoretic framework to analyze the non-asymptotic convergence of Q-learning. We show that the dynamics of asynchronous Q-learning with a constant step-size can be naturally formulated as a discrete-time stochastic afﬁne switching system. Moreover, the evolution of the Q-learning estimation error is over- and underestimated by trajectories of two simpler dynamical systems. Based on these two systems, we derive a new ﬁnite-time error bound of asynchronous Q…

## 3 Citations

Finite-Time Analysis of Constant Step-Size Q-Learning : Switching System Approach Revisited

- Computer Science
- 2022

This technical note revisits the novel switching system framework in [1] for analyzing the finite-time convergence of Q-learning, and improves the analysis by replacing the average iteration with the final iteration, which is simpler and more common in the literature.

Analysis of Temporal Difference Learning: Linear System Approach

- Computer ScienceArXiv
- 2022

A simple control theoretic analysis of TD-learning, which exploits linear system models and standard notions in linear system communities, and provides new simple templets for RL analysis, and additional insights on TD- learning and RL based on ideas in control theory are proposed.

Control Theoretic Analysis of Temporal Difference Learning

- Computer ScienceArXiv
- 2021

This paper proposes a control theoretic analysis of linear stochastic iterative algorithm and temporal difference learning, which exploits standard notions in linear system control communities and provides additional insights on TD-learning and reinforcement learning with simple concepts and analysis tools in control theory.

## References

SHOWING 1-10 OF 21 REFERENCES

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

- Computer ScienceIEEE Transactions on Information Theory
- 2022

The above bound improves upon the state-of-the-art result by a factor of at least 1 up to some logarithmic factor, provided that a proper constant learning rate is adopted.

Error bounds for constant step-size Q-learning

- Computer Science, MathematicsSyst. Control. Lett.
- 2012

Double Q-learning

- Computer ScienceNIPS
- 2010

An alternative way to approximate the maximum expected value for any set of random variables is introduced and the obtained double estimator method is shown to sometimes underestimate rather than overestimate themaximum expected value.

A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms

- Mathematics, Computer ScienceNeurIPS
- 2020

It is shown that the nonlinear ODE models associated with Q-learning and many of its variants can be naturally formulated as affine switching systems.

A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants

- Mathematics, Computer ScienceArXiv
- 2021

This paper develops an unified framework to study finite-sample convergence guarantees of a large class of value-based asynchronous reinforcement learning (RL) algorithms. We do this by first…

Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning

- Computer Science, MathematicsCOLT 2020
- 2020

A general asynchronous Stochastic Approximation scheme featuring a weighted infinity-norm contractive operator is considered, and a bound on its finite-time convergence rate on a single trajectory is proved.

Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning

- Mathematics, Computer ScienceArXiv
- 2019

These results show that relative to model-based Q-iteration, the `∞-based sample complexity of Q-learning is suboptimal in terms of the discount factor γ, and it is shown via simulation that the dependence of the bounds cannot be improved in a worst-case sense.

Nonlinear Systems

- Mathematics
- 2013

Nonlinearity is ubiquitous in physical phenomena. Fluid and plasma mechanics, gas dynamics, elasticity, relativity, chemical reactions, combustion, ecology, biomechanics, and many, many other…

Speedy Q-Learning

- Computer ScienceNIPS
- 2011

We introduce a new convergent variant of Q-learning, called speedy Q-learning (SQL), to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound…

Stability and Stabilizability of Switched Linear Systems: A Survey of Recent Results

- MathematicsIEEE Transactions on Automatic Control
- 2009

This paper focuses on the stability analysis for switched linear systems under arbitrary switching, and highlights necessary and sufficient conditions for asymptotic stability.