• Corpus ID: 211259183

Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games

@inproceedings{Lin2020FiniteTimeLC,
  title={Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games},
  author={Tianyi Lin and Zhengyuan Zhou and P. Mertikopoulos and Michael I. Jordan},
  booktitle={International Conference on Machine Learning},
  year={2020}
}
In this paper, we consider multi-agent learning via online gradient descent in a class of games called $\lambda$-cocoercive games, a fairly broad class of games that admits many Nash equilibria and that properly includes unconstrained strongly monotone games. We characterize the finite-time last-iterate convergence rate for joint OGD learning on $\lambda$-cocoercive games; further, building on this result, we develop a fully adaptive OGD learning algorithm that does not require any knowledge of… 

Tight last-iterate convergence rates for no-regret learning in multi-player games

The optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the gap function in smooth monotone games.

Gradient-free Online Learning in Games with Delayed Rewards

To account for the lack of a consistent stream of information, a gradient-free learning policy is introduced where payoff information is placed in a priority queue as it arrives and it is shown that the induced sequence of play converges to Nash equilibrium with probability $1$, even if the delay between choosing an action and receiving the corresponding reward is unbounded.

A Game-Theoretic Approach to Multi-Agent Trust Region Optimization

A multi-agent trust region learning method (MATRL) is proposed, which enables trust region optimization for multiagent learning and finds a stable improvement direction that is guided by the solution concept of Nash equilibrium at the meta-game level.

Adaptive Learning in Continuous Games: Optimal Regret Bounds and Convergence to Nash Equilibrium

A range of no-regret policies based on optimistic mirror descent are proposed, with the following desirable properties: i) they do not require any prior tuning or knowledge of the game; ii) they all achieve O( √ T ) regret against arbitrary, adversarial opponents; and iii) they converge to the best response against convergent opponents.

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

The expected co-coercivity condition is introduced, its benefits are explained, and the first last-iterate convergence guarantees of SGDA and SCO under this condition are provided for solving a class of stochastic variational inequality problems that are potentially non-monotone.

Last Round Convergence and No-Dynamic Regret in Asymmetric Repeated Games

A no-dynamic regret algorithm for the column player to exhibit last round convergence to a minimax equilibrium is developed and it is shown that this algorithm is efficient against a large set of popular no-regret algorithms the row player can use.

Solving Structured Hierarchical Games Using Differential Backward Induction

The effectiveness of the algorithm in finding globally stable solutions and its scalability for a recently introduced class of SHGs for pandemic policy making are demonstrated and the convergence properties of DBI are theoretically characterized and empirically demonstrated.

No-Regret Learning in Games with Noisy Feedback: Faster Rates and Adaptivity via Learning Rate Separation

This work proposes a fully adaptive method that smoothly interpolates between worstand best-case regret guarantees and achieves constant regret at a faster rate via an optimistic gradient scheme with learning rate separation.

Finite-Time Last-Iterate Convergence for Learning in Multi-Player Games

The tangent residual is used as the potential function in the analysis of the extragradient algorithm (or the optimistic gradient algorithm) and it is shown that both algorithms have last-iterate convergence rate of O( 1 √ T ) to a Nash equilibrium in terms of the gap function in smooth monotone games.

References

SHOWING 1-10 OF 40 REFERENCES

Learning in Games with Lossy Feedback

A simple variant of the classical online gradient descent algorithm, called reweightedOnline gradient descent (ROGD) is proposed and it is established that in variationally stable games, if each agent adopts ROGD, almost sure convergence to the set of Nash equilibria is guaranteed, even when the feedback loss is asynchronous and arbitrarily corrrelated among agents.

Mirror descent learning in continuous games

An equilibrium stability notion called variational stability (VS) is introduced and it is shown that in variationally stable games, the last iterate of OMD converges to the set of Nash equilibria.

Convergence of heterogeneous distributed learning in stochastic routing games

It is proved that if all players use the same sequence of learning rates, then their joint strategy converges almost surely to the equilibrium set, and upper bounds on the convergence rate are given.

Countering Feedback Delays in Multi-Agent Learning

To tackle fully decentralized, asynchronous environments with (possibly) unbounded delays between actions and feedback, this work proposes a variant of OMD which it calls delayed mirror descent (DMD), and which relies on the repeated leveraging of past information.

Learning in games with continuous action sets and unknown payoff functions

This paper focuses on learning via “dual averaging”, a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then “mirror” the output back to their action sets, and introduces the notion of variational stability.

Limits and limitations of no-regret learning in games

It is shown that the ratio distribution has sharp decay, in the sense that most generated games have small ratios, and games with large improvements from the bestNE to the best CCE present small degradation from the worst NE to the worst CCE.

Multi-Agent Online Learning with Imperfect Information

This paper considers a model of multi-agent online learning where the game is not known in advance, and the agents’ feedback is subject to both noise and delays, and proposes a variant of OMD which is called delayed mirror descent (DMD), and which relies on the repeated leveraging of past information.

Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos

This work analyzes MWU in congestion games where agents use \textit{arbitrary admissible constants} as learning rates $\epsilon$ and proves convergence to the exact Nash equilibria.

Distributed Stochastic Optimization via Matrix Exponential Learning

The proposed algorithm relies on the method of matrix exponential learning (MXL) and only requires locally computable gradient observations that are possibly imperfect and is globally convergent to such equilibria—or locally convergent when an equilibrium is only locally stable.

On the Convergence of Mirror Descent beyond Stochastic Convex Programming

An interesting insight is revealed regarding the convergence speed of SMD: in problems with sharp minima, SMD reaches a minimum point in a finite number of steps (a.s.), even in the presence of persistent gradient noise.