# Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

@inproceedings{Wei2021LastiterateCO, title={Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games}, author={Chen-Yu Wei and Chung-wei Lee and Mengxiao Zhang and Haipeng Luo}, booktitle={COLT}, year={2021} }

We study infinite-horizon discounted two-player zero-sum Markov games, and develop a decentralized algorithm that provably converges to the set of Nash equilibria under self-play. Our algorithm is based on running an Optimistic Gradient Descent Ascent algorithm on each state to learn the policies, with a critic that slowly learns the value of each state. To the best of our knowledge, this is the first algorithm in this setting that is simultaneously rational (converging to the opponent’s best… Expand

#### 9 Citations

Decentralized Q-Learning in Zero-sum Markov Games

- Computer Science, Mathematics
- ArXiv
- 2021

A radically uncoupled Q-learning dynamics that is both rational and convergent: the dynamics converges to the best response to the opponent’s strategy when the opponent follows an asymptotically stationary strategy; the value function estimates converge to the payoffs at a Nash equilibrium when both agents adopt the dynamics. Expand

Provably Efficient Reinforcement Learning in Decentralized General-Sum Markov Games

- Computer Science
- ArXiv
- 2021

This paper addresses the problem of learning an equilibrium efficiently in general-sum Markov games through decentralized multi-agent reinforcement learning by finding a coarse correlated equilibrium (CCE), a solution concept that generalizes NE by allowing possible correlations among the agents’ strategies. Expand

Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization

- Mathematics, Computer Science
- ArXiv
- 2021

Motivated by the algorithmic role of entropy regularization in single-agent reinforcement learning and game theory, provably efficient extragradient methods to find the quantal response equilibrium (QRE)—which are solutions to zero-sum two-player matrix games with entropy regularizations—at a linear rate are developed. Expand

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

- Computer Science
- ArXiv
- 2021

A novel definition of Markov Potential Games (MPG) is presented that generalizes prior attempts at capturing complex stateful multiagent coordination and proves (polynomially fast in the approximation error) convergence of independent policy gradient to Nash policies by adapting recent gradient dominance property arguments developed for single agent MDPs to multi-agent learning settings. Expand

V-Learning - A Simple, Efficient, Decentralized Algorithm for Multiagent RL

- Computer Science, Mathematics
- ArXiv
- 2021

This paper designs a new class of fully decentralized algorithms—V-learning, which provably learns Nash equilibria (in the two-player zero-sum setting), correlatedEquilibria and coarse correlated equilibrians ( in the multiplayer general-sumSetting) in a number of samples that only scales with maxi∈[m]Ai, where Ai is the number of actions for the i player. Expand

Independent Learning in Stochastic Games

- Computer Science, Mathematics
- ArXiv
- 2021

This review paper presents the recently proposed simple and independent learning dynamics that guarantee convergence in zero-sum stochastic games, together with a review of other contemporaneous algorithms for dynamic multi-agent learning in this setting. Expand

The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces

- Computer Science, Mathematics
- ArXiv
- 2021

A new algorithm is proposed that can provably find the Nash equilibrium policy using a polynomial number of samples, for any MG with low multi-agent Bellman-Eluder dimension—a new complexity measure adapted from its single-agent version (Jin et al., 2021). Expand

Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization

- Computer Science, Mathematics
- ArXiv
- 2021

We provide a first-order oracle complexity lower bound for finding stationary points of min-max optimization problems where the objective function is smooth, nonconvex in the minimization variable,… Expand

Optimal No-Regret Learning in General Games: Bounded Regret with Unbounded Step-Sizes via Clairvoyant MWU

- Computer Science, Economics
- ArXiv
- 2021

In this paper we solve the problem of no-regret learning in general games. Specifically, we provide a simple and practical algorithm that achieves constant regret with fixed step-sizes. The… Expand

#### References

SHOWING 1-10 OF 47 REFERENCES

POLICY OPTIMIZATION IN ZERO-SUM MARKOV GAMES: FICTITIOUS SELF-PLAY PROVABLY ATTAINS NASH EQUILIBRIA

- 2020

Fictitious Self-Play (FSP) has achieved significant empirical success in solving extensive-form games. However, from a theoretical perspective, it remains unknown whether FSP is guaranteed to… Expand

Independent Policy Gradient Methods for Competitive Reinforcement Learning

- Computer Science
- NeurIPS
- 2020

It is shown that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule. Expand

Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

- Computer Science, Mathematics
- NeurIPS
- 2019

This work appears to be the first to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria, and develops three projected nested-gradient methods that are guaranteed to converge to the NE of the game. Expand

Tight last-iterate convergence rates for no-regret learning in multi-player games

- Computer Science, Mathematics
- NeurIPS
- 2020

The optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the gap function in smooth monotone games. Expand

Linear Last-iterate Convergence for Matrix Games and Stochastic Games

- Computer Science, Mathematics
- ArXiv
- 2020

This work significantly expands the understanding of OGDA by introducing a set of sufficient conditions under which OGDA exhibits concrete last-iterate convergence rates with a constant learning rate, and shows that matrix games satisfy these conditions and OGDA converges exponentially fast without any additional assumptions. Expand

Optimization, Learning, and Games with Predictable Sequences

- Computer Science, Mathematics
- NIPS
- 2013

It is proved that a version of Optimistic Mirror Descent can be used by two strongly-uncoupled players in a finite zero-sum matrix game to converge to the minimax equilibrium at the rate of O((log T)/T). Expand

Multiplicative Weights Update in Zero-Sum Games

- Mathematics, Computer Science
- EC
- 2018

If equilibria are indeed predictive even for the benchmark class of zero-sum games, agents in practice must deviate robustly from the axiomatic perspective of optimization driven dynamics as captured by MWU and variants and apply carefully tailored equilibrium-seeking behavioral dynamics. Expand

Rational and Convergent Learning in Stochastic Games

- Mathematics, Computer Science
- IJCAI
- 2001

This paper introduces two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence, and contributes a new learning algorithm, WoLF policy hillclimbing, that is proven to be rational. Expand

Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games

- Mathematics, Computer Science
- ICML
- 2015

This paper provides a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games and shows that it can achieve a stationary policy which is 2γe+e′/(1-γ)2 -optimal. Expand

AWESOME: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents

- Computer Science, Mathematics
- Machine Learning
- 2006

AWESOME is presented, the first algorithm that is guaranteed to have the two properties in games with arbitrary numbers of actions and players and it is still the only algorithm that does so while only relying on observing the other players' actual actions (not their mixed strategies). Expand