• Corpus ID: 246679828

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

@article{Ding2022IndependentPG,
  title={Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence},
  author={Dongsheng Ding and Chen-Yu Wei and K. Zhang and Mihailo R. Jovanovi'c},
  journal={ArXiv},
  year={2022},
  volume={abs/2202.04129}
}
We examine global non-asymptotic convergence properties of policy gradient methods for multiagent reinforcement learning (RL) problems in Markov potential games (MPGs). To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, we propose new independent policy gradient algorithms that are run by all players in tandem. When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an -Nash equilibrium with O… 

Figures from this paper

Learning in Congestion Games with Bandit Feedback

TLDR
This paper investigates congestion games, a class of games with benign theoretical structure and broad real-world applications, and proposes a centralized algorithm based on the optimism in the face of uncertainty principle and a decentralized algorithm via a novel combination of the Frank-Wolfe method and G-optimal design.

Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

TLDR
The main contribution is the main algorithm for computing stationary ǫ -approximate Nash equilibria in adversarial team Markov games with computational complexity that is polynomial in all the natural parameters of the game, as well as 1 /ǫ .

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

TLDR
Self-Play PSRO (SP-PSRO) is introduced, a method that adds an approximately optimal stochastic policy to the population in each iteration and empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

TLDR
This paper proposes an unbiased model-free method, ESCHER, that is principled and is guaranteed to converge to an approximate Nash equilibrium with high probability in the tabular case and shows that a deep learning version ofESCHER outperforms the prior state of the art—DREAM and neural fictitious self play (NFSP)—and the difference becomes dramatic as game size increases.

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

TLDR
An algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate.

Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games

TLDR
A new algorithm, Decentralized Optimistic hypeRpolicy mIrror deScent (DORIS), which achieves √ Kregret in the context of general function approximation, where K is the number of episodes and DORIS maintains a hyperpolicy which is a distribution over the policy space.

Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

TLDR
This work considers two-agent multi-armed bandits and Markov decision processes with a hierarchical information structure arising in applications to propose simpler and more efficient algorithms that require no coordination or communication.

Logit-Q Learning in Markov Games

We present new independent learning dynamics provably converging to an efficient equilibrium (also known as optimal equilibrium) maximizing the social welfare in infinite-horizon discounted

Fictitious Play in Markov Games with Single Controller

Certain but important classes of strategic-form games, including zero-sum and identical-interest games, have thefictitious-play-property (FPP), i.e., beliefs formed in fictitious play dynamics always

Independent Natural Policy Gradient Methods for Potential Games: Finite-time Global Convergence with Entropy Regularization

TLDR
The proposed entropy-regularized NPG method enables each agent to deploy symmetric, decentralized, and multiplicative updates according to its own payoff, and it is shown that the proposed method converges to the quantal response equilibrium (QRE)—the equilibrium to the entropy- regularized game—at a sublinear rate.

References

SHOWING 1-10 OF 96 REFERENCES

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

TLDR
It is shown that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates.

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

TLDR
A novel definition of Markov Potential Games (MPG) is presented that generalizes prior attempts at capturing complex stateful multiagent coordination and proves (polynomially fast in the approximation error) convergence of independent policy gradient to Nash policies by adapting recent gradient dominance property arguments developed for single agent MDPs to multi-agent learning settings.

Equilibrium in a stochastic $n$-person game

Heuristically, a stochastic game is described by a sequence of states which are determined stochastically. The stochastic element arises from a set of transition probability measures. The

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

TLDR
A decentralized algorithm that provably converges to the set of Nash equilibria under self-play, and is simultaneously rational, convergent, agnostic, symmetric, and enjoying a finite-time last-iterate convergence guarantee.

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

TLDR
This chapter reviews the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two.

Independent Natural Policy Gradient Always Converges in Markov Potential Games

TLDR
This paper proves that Independent Natural Policy Gradient always converges using constant learning rates in Markov Potential Games, a particular class of multi-agent stochastic games called Markov potential Games.

Gradient Play in Multi-Agent Markov Stochastic Games: Stationary Points and Convergence

TLDR
For Markov potential games, it is proved that strict NEs are local maxima of the total potential function and fully-mixedNEs are saddle points, and a local convergence rate around strictNEs for more general settings is given.

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Chaos of Learning Beyond Zero-sum and Coordination via Game Decompositions

TLDR
A notion of ''matrix domination'' and design a linear program is proposed, and used to characterize bimatrix games where MWU is Lyapunov chaotic almost everywhere, indicating that chaos is a substantial issue of learning in games.

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

TLDR
This work provides provable characterizations of the computational, approximation, and sample size properties of policy gradient methods in the context of discounted Markov Decision Processes (MDPs), and shows an important interplay between estimation error, approximation error, and exploration.
...