• Corpus ID: 234767752

A Discrete-Time Switching System Analysis of Q-learning

  title={A Discrete-Time Switching System Analysis of Q-learning},
  author={Donghwan Lee and Jianghai Hu and Niao He},
This paper develops a novel control-theoretic framework to analyze the non-asymptotic convergence of Q-learning. We show that the dynamics of asynchronous Q-learning with a constant step-size can be naturally formulated as a discrete-time stochastic affine switching system. Moreover, the evolution of the Q-learning estimation error is over- and underestimated by trajectories of two simpler dynamical systems. Based on these two systems, we derive a new finite-time error bound of asynchronous Q… 

Figures from this paper

Finite-Time Analysis of Constant Step-Size Q-Learning : Switching System Approach Revisited
This technical note revisits the novel switching system framework in [1] for analyzing the finite-time convergence of Q-learning, and improves the analysis by replacing the average iteration with the final iteration, which is simpler and more common in the literature.
Analysis of Temporal Difference Learning: Linear System Approach
A simple control theoretic analysis of TD-learning, which exploits linear system models and standard notions in linear system communities, and provides new simple templets for RL analysis, and additional insights on TD- learning and RL based on ideas in control theory are proposed.
Control Theoretic Analysis of Temporal Difference Learning
This paper proposes a control theoretic analysis of linear stochastic iterative algorithm and temporal difference learning, which exploits standard notions in linear system control communities and provides additional insights on TD-learning and reinforcement learning with simple concepts and analysis tools in control theory.


Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
The above bound improves upon the state-of-the-art result by a factor of at least 1 up to some logarithmic factor, provided that a proper constant learning rate is adopted.
Error bounds for constant step-size Q-learning
Double Q-learning
An alternative way to approximate the maximum expected value for any set of random variables is introduced and the obtained double estimator method is shown to sometimes underestimate rather than overestimate themaximum expected value.
A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms
It is shown that the nonlinear ODE models associated with Q-learning and many of its variants can be naturally formulated as affine switching systems.
A Lyapunov Theory for Finite-Sample Guarantees of Asynchronous Q-Learning and TD-Learning Variants
This paper develops an unified framework to study finite-sample convergence guarantees of a large class of value-based asynchronous reinforcement learning (RL) algorithms. We do this by first
Finite-Time Analysis of Asynchronous Stochastic Approximation and Q-Learning
A general asynchronous Stochastic Approximation scheme featuring a weighted infinity-norm contractive operator is considered, and a bound on its finite-time convergence rate on a single trajectory is proved.
Stochastic approximation with cone-contractive operators: Sharp 𝓁∞-bounds for Q-learning
These results show that relative to model-based Q-iteration, the `∞-based sample complexity of Q-learning is suboptimal in terms of the discount factor γ, and it is shown via simulation that the dependence of the bounds cannot be improved in a worst-case sense.
Nonlinear Systems
Nonlinearity is ubiquitous in physical phenomena. Fluid and plasma mechanics, gas dynamics, elasticity, relativity, chemical reactions, combustion, ecology, biomechanics, and many, many other
Speedy Q-Learning
We introduce a new convergent variant of Q-learning, called speedy Q-learning (SQL), to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound
Stability and Stabilizability of Switched Linear Systems: A Survey of Recent Results
This paper focuses on the stability analysis for switched linear systems under arbitrary switching, and highlights necessary and sufficient conditions for asymptotic stability.