Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games

  title={Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games},
  author={Sihan Zeng and Thinh T. Doan and Justin K. Romberg},
We study the problem of finding the Nash equilibrium in a two-player zero-sum Markov game. Due to its formulation as a minimax optimization program, a natural approach to solve the problem is to perform gradient descent/ascent with respect to each player in an alternating fashion. However, due to the non-convexity/non-concavity of the underlying objective function, theoretical understandings of this method are limited. In our paper, we consider solving an entropy-regularized variant of the… 

Figures from this paper

Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games

This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games, and proposes a single-loop policy optimization method with symmetric updates from both agents, which achieves a last-iterate linear convergence to the quantal response equilibrium of the regularized problem.

Abstracting Imperfect Information Away from Two-Player Zero-Sum Games

ING IMPERFECT INFORMATION AWAY FROM TWO-PLAYER ZERO-SUM GAMES Samuel Sokota Carnegie Mellon University Ryan D’Orazio Mila, Université de Montréal Chun

Independent and Decentralized Learning in Markov Potential Games

We propose a multi-agent reinforcement learning dynamics, and analyze its convergence properties in infinite-horizon discounted Markov potential games. We focus on the independent and decentralized



Solving a Class of Non-Convex Min-Max Games Using Iterative First Order Methods

This paper proposes a multi-step gradient descent-ascent algorithm that finds an \varepsilon--first order stationary point of the game in \widetilde O(\varpsilon^{-3.5}) iterations, which is the best known rate in the literature.

Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium

This work develops provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves and proposes an optimistic variant of the least-squares minimax value iteration algorithm.

On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging

The stochastic bilinear minimax optimization problem is studied, an analysis of the same-sample Stochastic ExtraGradient method with constant step size is presented, and variations of the method that yield favorable convergence are presented.

Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games

This is the first quantitative analysis of policy gradient methods with function approximation for two-player zero-sum Markov games and thoroughly characterize the algorithms’ performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error.

A unified view of entropy-regularized Markov decision processes

A general framework for entropy-regularized average-reward reinforcement learning in Markov decision processes (MDPs) is proposed, showing that using the conditional entropy of the joint state-action distributions as regularization yields a dual optimization problem closely resembling the Bellman optimality equations.

Better Theory for SGD in the Nonconvex World

A new variant of the recently introduced expected smoothness assumption which governs the behaviour of the second moment of the stochastic gradient is proposed and it is shown that this assumption is both more general and more reasonable than assumptions made in all prior work.

What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization?

A proper mathematical definition of local optimality for this sequential setting---local minimax is proposed, as well as its properties and existence results are presented.

On the Global Convergence Rates of Softmax Policy Gradient Methods

It is shown that with the true gradient, policy gradient with a softmax parametrization converges at a $O(1/t)$ rate, with constants depending on the problem and initialization, which significantly expands the recent asymptotic convergence results.

Independent Policy Gradient Methods for Competitive Reinforcement Learning

It is shown that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.

A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning

The main results reproduce the best-known convergence rates for the general policy optimization problem and how they can be used to derive a state-of-the-art rate for the online linear-quadratic regulator (LQR) controllers.