• Corpus ID: 238408198

Robustness and sample complexity of model-based MARL for general-sum Markov games

  title={Robustness and sample complexity of model-based MARL for general-sum Markov games},
  author={Jayakumar Subramanian and Amit Sinha and Aditya Mahajan},
Multi-agent reinfocement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zerosum Markov games but is not applicable to general-sum Markov games. It is known that the best-response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibrium in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to… 

Figures from this paper

Provably Efficient Offline Multi-agent Reinforcement Learning via Strategy-wise Bonus
For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, this paper proposes a computationally efficient algorithm whose sample complexity enjoys a better dependency on the number of actions than the prior methods based on the point-wise Bonus.
A Survey on Model-based Reinforcement Learning
This survey takes a review of model-based reinforcement learning (MBRL) with a focus on the recent progress in deep RL, and discusses the applicability and advantages of MBRL in real-world tasks.


Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity
This paper studies arguably the most basic MARL setting: two-player discounted zero-sum Markov games, given only access to a generative model, and shows that model-based MARL achieves a sample complexity of $\tilde O(|S||A||B|(1-\gamma)^{-3}\epsilon^{-2})$ for finding the Nash equilibrium (NE) value up to some $\ep silon$ error.
Learning Nash Equilibrium for General-Sum Markov Games from Batch Data
A new definition of $\epsilon$-Nash equilibrium in MGs which grasps the strategy's quality for multiplayer games and introduces a neural network architecture named NashNetwork that successfully learns a Nash equilibrium in a generic multiplayer general-sum turn-based MG.
Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity
The sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors is settled by showing how to generalize a near-optimal Q-learning based algorithms for MDP, in particular Sidford et al (2018), to two- player strategy computation algorithms.
Two-Timescale Algorithms for Learning Nash Equilibria in General-Sum Stochastic Games
It is established that ON-SGSP consistently outperforms NashQ and FFQ algorithms on a single state non-generic game as well as on a synthetic two-player game setup with 810,000 states.
Approximations in Dynamic Zero-Sum Games II
This paper study of approximations of values and $\epsilon$-saddle-point policies in dynamic zero-sum games with countable state space and unbounded immediate reward and uses the extension of the general theorem for approximation to study approximation in stochastic games with complete information.
Nonzero-sum Stochastic Games
This paper treats of stochastic games. We focus on nonzero-sum games and provide a detailed survey of selected recent results. In Section 1, we consider stochastic Markov games. A correlation of
Markov Perfect Equilibrium: I. Observable Actions
This work defines Markov strategy and Markov perfect equilibrium and shows that an MPE is generically robust: if payoffs of a generic game are perturbed, there exists an almost Markovian equilibrium in the perturbed game near the initial MPE.
Multi-agent reinforcement learning algorithms
The theoretical foundations for the convergence of the algorithms proposed in this thesis are given and an algorithm converging in self-play to Nash equilibria for high percentage of general-sum discounted stochastic games is proposed.
Cyclic Equilibria in Markov Games
It is proved by construction that existing variants of value iteration cannot find stationary equilibrium policies in arbitrary general-sum Markov games, and it is proved empirically that value iteration finds cyclic equilibria in nearly all examples drawn from a random distribution of Markovgames.