Corpus ID: 231855674

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

  title={Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games},
  author={Chen-Yu Wei and Chung-wei Lee and Mengxiao Zhang and Haipeng Luo},
We study infinite-horizon discounted two-player zero-sum Markov games, and develop a decentralized algorithm that provably converges to the set of Nash equilibria under self-play. Our algorithm is based on running an Optimistic Gradient Descent Ascent algorithm on each state to learn the policies, with a critic that slowly learns the value of each state. To the best of our knowledge, this is the first algorithm in this setting that is simultaneously rational (converging to the opponent’s best… Expand
Decentralized Q-Learning in Zero-sum Markov Games
A radically uncoupled Q-learning dynamics that is both rational and convergent: the dynamics converges to the best response to the opponent’s strategy when the opponent follows an asymptotically stationary strategy; the value function estimates converge to the payoffs at a Nash equilibrium when both agents adopt the dynamics. Expand
Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games
This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning by finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents’ strategies. Expand
Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization
Motivated by the algorithmic role of entropy regularization in single-agent reinforcement learning and game theory, provably efficient extragradient methods to find the quantal response equilibrium (QRE)—which are solutions to zero-sum two-player matrix games with entropy regularizations—at a linear rate are developed. Expand
Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games
A novel definition of Markov Potential Games (MPG) is presented that generalizes prior attempts at capturing complex stateful multiagent coordination and proves (polynomially fast in the approximation error) convergence of independent policy gradient to Nash policies by adapting recent gradient dominance property arguments developed for single agent MDPs to multi-agent learning settings. Expand
V-Learning - A Simple, Efficient, Decentralized Algorithm for Multiagent RL
  • Chi Jin, Qinghua Liu, Yuanhao Wang, Tiancheng Yu
  • Computer Science, Mathematics
  • ArXiv
  • 2021
This paper designs a new class of fully decentralized algorithms—V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlatedEquilibria and coarse correlated equilibrians ( in the multiplayer general-sumSetting) in a number of samples that only scales with maxi∈[m]Ai, where Ai is the number of actions for the i player. Expand
Independent Learning in Stochastic Games
This review paper presents the recently proposed simple and independent learning dynamics that guarantee convergence in zero-sum stochastic games, together with a review of other contemporaneous algorithms for dynamic multi-agent learning in this setting. Expand
The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces
A new algorithm is proposed that can provably find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension—a new complexity measure adapted from its single-agent version (Jin et al., 2021). Expand
Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization
We provide a first-order oracle complexity lower bound for finding stationary points of min-max optimization problems where the objective function is smooth, nonconvex in the minimization variable,Expand
Optimal No-Regret Learning in General Games: Bounded Regret with Unbounded Step-Sizes via Clairvoyant MWU
In this paper we solve the problem of no-regret learning in general games. Specifically, we provide a simple and practical algorithm that achieves constant regret with fixed step-sizes. TheExpand


  • 2020
Fictitious Self-Play (FSP) has achieved significant empirical success in solving extensive-form games. However, from a theoretical perspective, it remains unknown whether FSP is guaranteed toExpand
Independent Policy Gradient Methods for Competitive Reinforcement Learning
It is shown that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule. Expand
Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games
This work appears to be the first to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria, and develops three projected nested-gradient methods that are guaranteed to converge to the NE of the game. Expand
Tight last-iterate convergence rates for no-regret learning in multi-player games
The optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the gap function in smooth monotone games. Expand
Linear Last-iterate Convergence for Matrix Games and Stochastic Games
This work significantly expands the understanding of OGDA by introducing a set of sufficient conditions under which OGDA exhibits concrete last-iterate convergence rates with a constant learning rate, and shows that matrix games satisfy these conditions and OGDA converges exponentially fast without any additional assumptions. Expand
Optimization, Learning, and Games with Predictable Sequences
It is proved that a version of Optimistic Mirror Descent can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T)/T). Expand
Multiplicative Weights Update in Zero-Sum Games
If equilibria are indeed predictive even for the benchmark class of zero-sum games, agents in practice must deviate robustly from the axiomatic perspective of optimization driven dynamics as captured by MWU and variants and apply carefully tailored equilibrium-seeking behavioral dynamics. Expand
Rational and Convergent Learning in Stochastic Games
This paper introduces two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence, and contributes a new learning algorithm, WoLF policy hillclimbing, that is proven to be rational. Expand
Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games
This paper provides a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games and shows that it can achieve a stationary policy which is 2γe+e′/(1-γ)2 -optimal. Expand
AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
AWESOME is presented, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players and it is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). Expand