$O\left(1/T\right)$ Time-Average Convergence in a Generalization of Multiagent Zero-Sum Games
@inproceedings{Bailey2021Oleft1TrightTC, title={\$O\left(1/T\right)\$ Time-Average Convergence in a Generalization of Multiagent Zero-Sum Games}, author={James P. Bailey}, year={2021} }
We introduce a generalization of zero-sum network multiagent matrix games and prove that alternating gradient descent converges to the set of Nash equilibria at rate O(1/T ) for this set of games. Alternating gradient descent obtains this convergence guarantee while using fixed learning rates that are four times larger than the optimistic variant of gradient descent. Experimentally, we show with 97.5% confidence that, on average, these larger learning rates result in time-averaged strategies…
Figures and Tables from this paper
References
SHOWING 1-10 OF 26 REFERENCES
Multiplicative Weights Update in Zero-Sum Games
- EconomicsEC
- 2018
If equilibria are indeed predictive even for the benchmark class of zero-sum games, agents in practice must deviate robustly from the axiomatic perspective of optimization driven dynamics as captured by MWU and variants and apply carefully tailored equilibrium-seeking behavioral dynamics.
Fast and Furious Learning in Zero-Sum Games: Vanishing Regret with Non-Vanishing Step Sizes
- Computer ScienceNeurIPS
- 2019
We show for the first time, to our knowledge, that it is possible to reconcile in online learning in zero-sum games two seemingly contradictory objectives: vanishing time-average regret and…
Near-Optimal No-Regret Learning in General Games
- Computer Science, EconomicsNeurIPS
- 2021
The bound is that Optimistic Hedge converges to coarse correlated equilibrium in general games at a rate of ˜ O ( cid:0) 1 T (cid:1) .
Multi-Agent Learning in Network Zero-Sum Games is a Hamiltonian System
- EconomicsAAMAS
- 2019
This work establishes a formal and robust connection between multi-agent systems and Hamiltonian dynamics -- the same dynamics that describe conservative systems in physics and provides a type of a Rosetta stone that helps to translate results and techniques between online optimization, convex analysis, games theory, and physics.
Learning in Games via Reinforcement and Regularization
- EconomicsMath. Oper. Res.
- 2016
This paper extends several properties of exponential learning, including the elimination of dominated strategies, the asymptotic stability of strict Nash equilibria, and the convergence of time-averaged trajectories in zero-sum games with an interior Nash equilibrium.
Finite Regret and Cycles with Fixed Step-Size via Alternating Gradient Descent-Ascent
- Computer ScienceCOLT 2019
- 2019
This paper shows that in adversarial settings that agents' strategies are bounded and cycle when both are using the alternating gradient descent algorithm, and shows that an agent that uses gradient descent obtains bounded regret.
Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization
- Computer ScienceITCS
- 2019
It is shown that OMWU monotonically improves the Kullback-Leibler divergence of the current iterate to the (appropriately normalized) min-max solution until it enters a neighborhood of the solution and becomes a contracting map converging to the exact solution.
Chaos, Extremism and Optimism: Volume Analysis of Learning in Games
- EconomicsNeurIPS
- 2020
Two novel, rather negative properties of MWU in zero-sum games are proved: Extremism: even in games with unique fully mixed Nash equilibrium, the system recurrently gets stuck near pure-strategy profiles, despite them being clearly unstable from game theoretic perspective and Unavoidability: the system cannot avoid bad points indefinitely.
Zero-Sum Polymatrix Games: A Generalization of Minmax
- EconomicsMath. Oper. Res.
- 2016
We show that in zero-sum polymatrix games, a multiplayer generalization of two-person zero-sum games, Nash equilibria can be found efficiently with linear programming. We also show that the set of…
Let's be honest: An optimal no-regret framework for zero-sum games
- Computer ScienceICML
- 2018
A simple algorithmic framework is proposed that simultaneously achieves the best rates for honest regret as well as adversarial regret, and in addition resolves the open problem of removing the logarithmic terms in convergence to the value of the game.