• Corpus ID: 246679828

Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence

@article{Ding2022IndependentPG,
  title={Independent Policy Gradient for Large-Scale Markov Potential Games: Sharper Rates, Function Approximation, and Game-Agnostic Convergence},
  author={Dongsheng Ding and Chen-Yu Wei and K. Zhang and Mihailo R. Jovanovi'c},
  journal={ArXiv},
  year={2022},
  volume={abs/2202.04129}
}
We examine global non-asymptotic convergence properties of policy gradient methods for multiagent reinforcement learning (RL) problems in Markov potential games (MPGs). To learn a Nash equilibrium of an MPG in which the size of state space and/or the number of players can be very large, we propose new independent policy gradient algorithms that are run by all players in tandem. When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an -Nash equilibrium with O… 

Figures from this paper

Learning in Congestion Games with Bandit Feedback

This paper proposes a centralized algorithm based on the optimism in the face of uncertainty principle for congestion games with (semi-)bandit feedback, and proposes a decentralized algorithm via a novel combination of the Frank-Wolfe method and G-optimal design.

Efficiently Computing Nash Equilibria in Adversarial Team Markov Games

The main contribution is the main algorithm for computing stationary ǫ -approximate Nash equilibria in adversarial team Markov games with computational complexity that is polynomial in all the natural parameters of the game, as well as 1 /ǫ .

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

A Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information is proposed, which explicitly captures the trade-off between optimality and computational complexity in choosing κ.

How Bad is Selfish Driving? Bounding the Inefficiency of Equilibria in Urban Driving Games

—We consider the interaction among agents engag- ing in a driving task and we model it as general-sum game. This class of games exhibits a plurality of different equilibria posing the issue of

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

This work examines the long-run behavior of policy gradient methods with respect to Nash equilibrium policies that are second-order stationary and shows that SOS policies are locally attracting with high probability, and that policy gradient trajectories with gradient estimates provided by the Reinforce algorithm achieve an O (1 / √ n ) distance-squared convergence rate if the method’s step-size is chosen appropriately.

Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games

Self-Play PSRO (SP-PSRO) is introduced, a method that adds an approximately optimal stochastic policy to the population in each iteration and empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.

ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret

This paper proposes an unbiased model-free method, ESCHER, that is principled and is guaranteed to converge to an approximate Nash equilibrium with high probability and is able to beat DREAM and NFSP in a head-to-head competition over 90% of the time.

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

An algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate.

Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games

A new algorithm, Decentralized Optimistic hypeRpolicy mIrror deScent (DORIS), which achieves √ Kregret in the context of general function approximation, where K is the number of episodes and DORIS maintains a hyperpolicy which is a distribution over the policy space.

Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

This work considers two-agent multi-armed bandits and Markov decision processes with a hierarchical information structure arising in applications to propose simpler and more efficient algorithms that require no coordination or communication.

References

SHOWING 1-10 OF 96 REFERENCES

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

It is shown that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates.

Global Convergence of Multi-Agent Policy Gradient in Markov Potential Games

A novel definition of Markov Potential Games (MPG) is presented that generalizes prior attempts at capturing complex stateful multiagent coordination and proves (polynomially fast in the approximation error) convergence of independent policy gradient to Nash policies by adapting recent gradient dominance property arguments developed for single agent MDPs to multi-agent learning settings.

Equilibrium in a stochastic $n$-person game

Heuristically, a stochastic game is described by a sequence of states which are determined stochastically. The stochastic element arises from a set of transition probability measures. The

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

A decentralized algorithm that provably converges to the set of Nash equilibria under self-play, and is simultaneously rational, convergent, agnostic, symmetric, and enjoying a finite-time last-iterate convergence guarantee.

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

This chapter reviews the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two.

Independent Natural Policy Gradient Always Converges in Markov Potential Games

This paper proves that Independent Natural Policy Gradient always converges using constant learning rates in Markov Potential Games, a particular class of multi-agent stochastic games called Markov potential Games.

Gradient Play in Multi-Agent Markov Stochastic Games: Stationary Points and Convergence

For Markov potential games, it is proved that strict NEs are local maxima of the total potential function and fully-mixedNEs are saddle points, and a local convergence rate around strictNEs for more general settings is given.

Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory

Chaos of Learning Beyond Zero-sum and Coordination via Game Decompositions

A notion of ''matrix domination'' and design a linear program is proposed, and used to characterize bimatrix games where MWU is Lyapunov chaotic almost everywhere, indicating that chaos is a substantial issue of learning in games.

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

This work provides provable characterizations of the computational, approximation, and sample size properties of policy gradient methods in the context of discounted Markov Decision Processes (MDPs), and shows an important interplay between estimation error, approximation error, and exploration.
...