• Corpus ID: 211259183

Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games

@inproceedings{Lin2020FiniteTimeLC,
title={Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games},
author={Tianyi Lin and Zhengyuan Zhou and P. Mertikopoulos and Michael I. Jordan},
booktitle={International Conference on Machine Learning},
year={2020}
}
• Published in
International Conference on…
23 February 2020
• Computer Science
In this paper, we consider multi-agent learning via online gradient descent in a class of games called $\lambda$-cocoercive games, a fairly broad class of games that admits many Nash equilibria and that properly includes unconstrained strongly monotone games. We characterize the finite-time last-iterate convergence rate for joint OGD learning on $\lambda$-cocoercive games; further, building on this result, we develop a fully adaptive OGD learning algorithm that does not require any knowledge of…
25 Citations
• Computer Science
NeurIPS
• 2020
The optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the gap function in smooth monotone games.
• Computer Science
ArXiv
• 2020
To account for the lack of a consistent stream of information, a gradient-free learning policy is introduced where payoff information is placed in a priority queue as it arrives and it is shown that the induced sequence of play converges to Nash equilibrium with probability $1$, even if the delay between choosing an action and receiving the corresponding reward is unbounded.
• Computer Science
ArXiv
• 2021
A multi-agent trust region learning method (MATRL) is proposed, which enables trust region optimization for multiagent learning and finds a stable improvement direction that is guided by the solution concept of Nash equilibrium at the meta-game level.
• Economics
COLT
• 2021
A range of no-regret policies based on optimistic mirror descent are proposed, with the following desirable properties: i) they do not require any prior tuning or knowledge of the game; ii) they all achieve O( √ T ) regret against arbitrary, adversarial opponents; and iii) they converge to the best response against convergent opponents.
• Computer Science, Mathematics
NeurIPS
• 2021
The expected co-coercivity condition is introduced, its benefits are explained, and the first last-iterate convergence guarantees of SGDA and SCO under this condition are provided for solving a class of stochastic variational inequality problems that are potentially non-monotone.
• Computer Science
ALT
• 2021
A no-dynamic regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium is developed and it is shown that this algorithm is efficient against a large set of popular no-regret algorithms the row player can use.
• Economics
UAI
• 2022
The effectiveness of the algorithm in finding globally stable solutions and its scalability for a recently introduced class of SHGs for pandemic policy making are demonstrated and the convergence properties of DBI are theoretically characterized and empirically demonstrated.
• Computer Science
ArXiv
• 2022
This work proposes a fully adaptive method that smoothly interpolates between worstand best-case regret guarantees and achieves constant regret at a faster rate via an optimistic gradient scheme with learning rate separation.
The tangent residual is used as the potential function in the analysis of the extragradient algorithm (or the optimistic gradient algorithm) and it is shown that both algorithms have last-iterate convergence rate of O( 1 √ T ) to a Nash equilibrium in terms of the gap function in smooth monotone games.

References

SHOWING 1-10 OF 40 REFERENCES

• Computer Science
NeurIPS
• 2018
A simple variant of the classical online gradient descent algorithm, called reweightedOnline gradient descent (ROGD) is proposed and it is established that in variationally stable games, if each agent adopts ROGD, almost sure convergence to the set of Nash equilibria is guaranteed, even when the feedback loss is asynchronous and arbitrarily corrrelated among agents.
• Computer Science
2017 IEEE 56th Annual Conference on Decision and Control (CDC)
• 2017
An equilibrium stability notion called variational stability (VS) is introduced and it is shown that in variationally stable games, the last iterate of OMD converges to the set of Nash equilibria.
• Economics
2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)
• 2015
It is proved that if all players use the same sequence of learning rates, then their joint strategy converges almost surely to the equilibrium set, and upper bounds on the convergence rate are given.
• Computer Science
NIPS
• 2017
To tackle fully decentralized, asynchronous environments with (possibly) unbounded delays between actions and feedback, this work proposes a variant of OMD which it calls delayed mirror descent (DMD), and which relies on the repeated leveraging of past information.
• Economics, Mathematics
Math. Program.
• 2019
This paper focuses on learning via “dual averaging”, a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then “mirror” the output back to their action sets, and introduces the notion of variational stability.
• Economics
The Knowledge Engineering Review
• 2017
It is shown that the ratio distribution has sharp decay, in the sense that most generated games have small ratios, and games with large improvements from the bestNE to the best CCE present small degradation from the worst NE to the worst CCE.
• Computer Science
• 2018
This paper considers a model of multi-agent online learning where the game is not known in advance, and the agents’ feedback is subject to both noise and delays, and proposes a variant of OMD which is called delayed mirror descent (DMD), and which relies on the repeated leveraging of past information.
• Computer Science
NIPS
• 2017
This work analyzes MWU in congestion games where agents use \textit{arbitrary admissible constants} as learning rates $\epsilon$ and proves convergence to the exact Nash equilibria.
• Computer Science
IEEE Transactions on Signal Processing
• 2017
The proposed algorithm relies on the method of matrix exponential learning (MXL) and only requires locally computable gradient observations that are possibly imperfect and is globally convergent to such equilibria—or locally convergent when an equilibrium is only locally stable.
• Mathematics, Computer Science
SIAM J. Optim.
• 2020
An interesting insight is revealed regarding the convergence speed of SMD: in problems with sharp minima, SMD reaches a minimum point in a finite number of steps (a.s.), even in the presence of persistent gradient noise.