Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games
@article{Zeng2022RegularizedGD, title={Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games}, author={Sihan Zeng and Thinh T. Doan and Justin K. Romberg}, journal={ArXiv}, year={2022}, volume={abs/2205.13746} }
We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of this method are limited. In our paper, we consider solving an entropy-regularized variant of the…
3 Citations
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games
- Computer ScienceArXiv
- 2022
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games, and proposes a single-loop policy optimization method with symmetric updates from both agents, which achieves a last-iterate linear convergence to the quantal response equilibrium of the regularized problem.
Abstracting Imperfect Information Away from Two-Player Zero-Sum Games
- EducationArXiv
- 2023
ING IMPERFECT INFORMATION AWAY FROM TWO-PLAYER ZERO-SUM GAMES Samuel Sokota Carnegie Mellon University ssokota@andrew.cmu.edu Ryan D’Orazio Mila, Université de Montréal ryan.dorazio@mila.quebec Chun…
Independent and Decentralized Learning in Markov Potential Games
- EconomicsArXiv
- 2022
We propose a multi-agent reinforcement learning dynamics, and analyze its convergence properties in infinite-horizon discounted Markov potential games. We focus on the independent and decentralized…
References
SHOWING 1-10 OF 46 REFERENCES
Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods
- Computer ScienceNeurIPS
- 2019
This paper proposes a multi-step gradient descent-ascent algorithm that finds an \varepsilon--first order stationary point of the game in \widetilde O(\varpsilon^{-3.5}) iterations, which is the best known rate in the literature.
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium
- Computer ScienceCOLT
- 2020
This work develops provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves and proposes an optimistic variant of the least-squares minimax value iteration algorithm.
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging
- GeologyAISTATS
- 2022
The stochastic bilinear minimax optimization problem is studied, an analysis of the same-sample Stochastic ExtraGradient method with constant step size is presented, and variations of the method that yield favorable convergence are presented.
Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games
- Computer ScienceArXiv
- 2021
This is the first quantitative analysis of policy gradient methods with function approximation for two-player zero-sum Markov games and thoroughly characterize the algorithms’ performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error.
A unified view of entropy-regularized Markov decision processes
- Computer ScienceArXiv
- 2017
A general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs) is proposed, showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations.
Better Theory for SGD in the Nonconvex World
- Computer ScienceArXiv
- 2020
A new variant of the recently introduced expected smoothness assumption which governs the behaviour of the second moment of the stochastic gradient is proposed and it is shown that this assumption is both more general and more reasonable than assumptions made in all prior work.
What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization?
- Computer ScienceICML
- 2020
A proper mathematical definition of local optimality for this sequential setting---local minimax is proposed, as well as its properties and existence results are presented.
On the Global Convergence Rates of Softmax Policy Gradient Methods
- Computer ScienceICML
- 2020
It is shown that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization, which significantly expands the recent asymptotic convergence results.
Independent Policy Gradient Methods for Competitive Reinforcement Learning
- Computer ScienceNeurIPS
- 2020
It is shown that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning
- Mathematics, Computer ScienceArXiv
- 2021
The main results reproduce the best-known convergence rates for the general policy optimization problem and how they can be used to derive a state-of-the-art rate for the online linear-quadratic regulator (LQR) controllers.