Fictitious play in zero-sum stochastic games
@article{Sayin2020FictitiousPI, title={Fictitious play in zero-sum stochastic games}, author={Muhammed O. Sayin and Francesca Parise and Asuman E. Ozdaglar}, journal={SIAM J. Control. Optim.}, year={2020}, volume={60}, pages={2095-2114} }
We present fictitious play dynamics for the general class of stochastic games and analyze its convergence properties in zero-sum stochastic games. Our dynamics involves agents forming beliefs on opponent strategy and their own continuation payoff (Q-function), and playing a myopic best response using estimated continuation payoffs. Agents update their beliefs at states visited from observations of opponent actions. A key property of the learning dynamics is that update of the beliefs on Q…
18 Citations
On the Global Convergence of Stochastic Fictitious Play in Stochastic Games with Turn-based Controllers
- Economics2022 IEEE 61st Conference on Decision and Control (CDC)
- 2022
This paper presents a learning dynamic with almost sure convergence guarantee for any stochastic game with turn-based controllers (on state transitions) as long as stage-payoffs have stochastic…
Smooth Fictitious Play in Stochastic Games with Perturbed Payoffs and Unknown Transitions
- EconomicsArXiv
- 2022
Recent extensions to dynamic games (Leslie et al. [2020], Sayin et al. [2021], Baudin and Laraki [2022]) of the well-known fictitious play learning procedure in static games were proved to globally…
Decentralized Q-Learning in Zero-sum Markov Games
- EconomicsNeurIPS
- 2021
A radically uncoupled Q-learning dynamics that is both rational and convergent is developed: the learning dynamics converges to the best response to the opponent’s strategy when the opponent follows an asymptotically stationary strategy; when both agents adopt thelearning dynamics, they converge to the Nash equilibrium of the game.
Independent and Decentralized Learning in Markov Potential Games
- EconomicsArXiv
- 2022
We propose a multi-agent reinforcement learning dynamics, and analyze its convergence properties in infinite-horizon discounted Markov potential games. We focus on the independent and decentralized…
Logit-Q Learning in Markov Games
- EconomicsArXiv
- 2022
We present new independent learning dynamics provably converging to an efficient equilibrium (also known as optimal equilibrium) maximizing the social welfare in infinite-horizon discounted…
On the Heterogeneity of Independent Learning Dynamics in Zero-sum Stochastic Games
- MathematicsL4DC
- 2022
A novel Lyapunov function formulation is formulated and its almost sure convergence under the standard assumptions in two-timescale stochastic approximation methods when the discount factor is less than the product of the ratios of player-dependent step sizes is shown.
Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games
- Computer ScienceCOLT
- 2021
A decentralized algorithm that provably converges to the set of Nash equilibria under self-play, and is simultaneously rational, convergent, agnostic, symmetric, and enjoying a finite-time last-iterate convergence guarantee.
Decentralized Inertial Best-Response with Voluntary and Limited Communication in Random Communication Networks
- EconomicsAutom.
- 2022
The Confluence of Networks, Games and Learning
- Computer ScienceArXiv
- 2021
An selective overview of game-theoretic learning algorithms within the framework of stochastic approximation theory, and associated applications in some representative contexts of modern network systems, such as the next generation wireless communication networks, the smart grid and distributed machine learning.
Fictitious Play and Best-Response Dynamics in Identical Interest and Zero-Sum Stochastic Games
- EconomicsICML
- 2022
This paper proposes an extension of a popular decentralized discrete-time learning procedure when repeating a static game called fictitious play (FP) (Brown, 1951; Robinson, 1951) to a dynamic model…
References
SHOWING 1-10 OF 61 REFERENCES
Learning Mixed Equilibria
- Economics
- 1993
We study learning processes for finite strategic-form games, in which players use the history of past play to forecast play in the current period. In a generalization of fictitious play, we assume…
Payoff-Based Dynamics for Multiplayer Weakly Acyclic Games
- EconomicsSIAM J. Control. Optim.
- 2009
This work introduces three different payoff-based processes for increasingly general scenarios and proves that, after a sufficiently large number of stages, player actions constitute a Nash equilibrium at any stage with arbitrarily high probability.
Individual Q-Learning in Normal Form Games
- Computer ScienceSIAM J. Control. Optim.
- 2005
This work considers the behavior of value-based learning agents in the multi-agent multi-armed bandit problem, and shows that such agents cannot generally play at a Nash equilibrium, although if smooth best responses are used, a Nash distribution can be reached.
Fictitious play in stochastic games
- EconomicsMath. Methods Oper. Res.
- 2007
It is shown that the fictitious play process for bimatrix games does not necessarily converge, not even in the 2 ×-2 × 2 case with a unique equilibrium in stationary strategies.
Robustness Properties in Fictitious-Play-Type Algorithms
- Computer ScienceSIAM J. Control. Optim.
- 2017
This paper provides a unified analysis of the behavior of FP-type algorithms under an important class of perturbations, thus demonstrating robustness to deviations from the idealistic operating conditions that have been previously assumed.
Fictitious play applied to sequences of games and discounted stochastic games
- Mathematics
- 1982
In this paper, we show that the iterative method of Brown and Robinson, for solving a matrix game, is also applicable to a converging sequence of matrices, where the players choose at staget a row…
Equilibrium in a stochastic $n$-person game
- Economics
- 1964
Heuristically, a stochastic game is described by a sequence of states which are determined stochastically. The stochastic element arises from a set of transition probability measures. The…