Fictitious play in zero-sum stochastic games

@article{Sayin2020FictitiousPI,
  title={Fictitious play in zero-sum stochastic games},
  author={Muhammed O. Sayin and Francesca Parise and Asuman E. Ozdaglar},
  journal={SIAM J. Control. Optim.},
  year={2020},
  volume={60},
  pages={2095-2114}
}
We present fictitious play dynamics for the general class of stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves agents forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playing a myopic best response using estimated continuation payoffs. Agents update their beliefs at states visited from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q… 

Figures and Tables from this paper

On the Global Convergence of Stochastic Fictitious Play in Stochastic Games with Turn-based Controllers

  • M. O. Sayin
  • Economics
    2022 IEEE 61st Conference on Decision and Control (CDC)
  • 2022
This paper presents a learning dynamic with almost sure convergence guarantee for any stochastic game with turn-based controllers (on state transitions) as long as stage-payoffs have stochastic

Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions

Recent extensions to dynamic games (Leslie et al. [2020], Sayin et al. [2021], Baudin and Laraki [2022]) of the well-known fictitious play learning procedure in static games were proved to globally

Decentralized Q-Learning in Zero-sum Markov Games

A radically uncoupled Q-learning dynamics that is both rational and convergent is developed: the learning dynamics converges to the best response to the opponent’s strategy when the opponent follows an asymptotically stationary strategy; when both agents adopt thelearning dynamics, they converge to the Nash equilibrium of the game.

Independent and Decentralized Learning in Markov Potential Games

We propose a multi-agent reinforcement learning dynamics, and analyze its convergence properties in infinite-horizon discounted Markov potential games. We focus on the independent and decentralized

Logit-Q Learning in Markov Games

We present new independent learning dynamics provably converging to an efficient equilibrium (also known as optimal equilibrium) maximizing the social welfare in infinite-horizon discounted

On the Heterogeneity of Independent Learning Dynamics in Zero-sum Stochastic Games

A novel Lyapunov function formulation is formulated and its almost sure convergence under the standard assumptions in two-timescale stochastic approximation methods when the discount factor is less than the product of the ratios of player-dependent step sizes is shown.

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

A decentralized algorithm that provably converges to the set of Nash equilibria under self-play, and is simultaneously rational, convergent, agnostic, symmetric, and enjoying a finite-time last-iterate convergence guarantee.

The Confluence of Networks, Games and Learning

An selective overview of game-theoretic learning algorithms within the framework of stochastic approximation theory, and associated applications in some representative contexts of modern network systems, such as the next generation wireless communication networks, the smart grid and distributed machine learning.

Fictitious Play and Best-Response Dynamics in Identical Interest and Zero-Sum Stochastic Games

This paper proposes an extension of a popular decentralized discrete-time learning procedure when repeating a static game called fictitious play (FP) (Brown, 1951; Robinson, 1951) to a dynamic model

References

SHOWING 1-10 OF 61 REFERENCES

Learning Mixed Equilibria

We study learning processes for finite strategic-form games, in which players use the history of past play to forecast play in the current period. In a generalization of fictitious play, we assume

Payoff-Based Dynamics for Multiplayer Weakly Acyclic Games

This work introduces three different payoff-based processes for increasingly general scenarios and proves that, after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability.

Individual Q-Learning in Normal Form Games

This work considers the behavior of value-based learning agents in the multi-agent multi-armed bandit problem, and shows that such agents cannot generally play at a Nash equilibrium, although if smooth best responses are used, a Nash distribution can be reached.

Fictitious play in stochastic games

It is shown that the fictitious play process for bimatrix games does not necessarily converge, not even in the 2 ×-2 × 2 case with a unique equilibrium in stationary strategies.

Robustness Properties in Fictitious-Play-Type Algorithms

This paper provides a unified analysis of the behavior of FP-type algorithms under an important class of perturbations, thus demonstrating robustness to deviations from the idealistic operating conditions that have been previously assumed.

Fictitious play applied to sequences of games and discounted stochastic games

In this paper, we show that the iterative method of Brown and Robinson, for solving a matrix game, is also applicable to a converging sequence of matrices, where the players choose at staget a row

Equilibrium in a stochastic $n$-person game

Heuristically, a stochastic game is described by a sequence of states which are determined stochastically. The stochastic element arises from a set of transition probability measures. The

Generalised weakened fictitious play

...