• Corpus ID: 250420821

Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

  title={Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback},
  author={Tianyi Lin and Zhengyuan Zhou and Wenjia Ba and Jiawei Zhang},
We consider online no-regret learning in unknown games with bandit feedback, where each player can only observe its reward at each time – determined by all players’ current joint action – rather than its gradient. We focus on the class of smooth and strongly monotone games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of Θ̃(n √ T ) under… 

Figures and Tables from this paper

Doubly Optimal No-Regret Learning in Monotone Games




Bandit learning in concave N-person games

This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games and derives an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.

Tight last-iterate convergence rates for no-regret learning in multi-player games

The optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the gap function in smooth monotone games.

Learning in games with continuous action sets and unknown payoff functions

This paper focuses on learning via “dual averaging”, a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then “mirror” the output back to their action sets, and introduces the notion of variational stability.

Learning in Games with Lossy Feedback

A simple variant of the classical online gradient descent algorithm, called reweightedOnline gradient descent (ROGD) is proposed and it is established that in variationally stable games, if each agent adopts ROGD, almost sure convergence to the set of Nash equilibria is guaranteed, even when the feedback loss is asynchronous and arbitrarily corrrelated among agents.

Mirror descent learning in continuous games

An equilibrium stability notion called variational stability (VS) is introduced and it is shown that in variationally stable games, the last iterate of OMD converges to the set of Nash equilibria.

Learning with minimal information in continuous games

This paper designs a stochastic learning process called the dampened gradient approximation process for games with continuous action sets and shows that despite such limited information, players will converge to Nash in large classes of games.

Finite-Time Last-Iterate Convergence for Multi-Agent Learning in Games

The first set of results that fill in several gaps of the existing multi-agent online learning literature, where three aspects--finite-time convergence rates, non-decreasing step-sizes, and fully adaptive algorithms have been unexplored before are provided.

Countering Feedback Delays in Multi-Agent Learning

To tackle fully decentralized, asynchronous environments with (possibly) unbounded delays between actions and feedback, this work proposes a variant of OMD which it calls delayed mirror descent (DMD), and which relies on the repeated leveraging of past information.

Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games

A general adversarial online learning problem, in which a decision set X' in a reflexive Banach space X and a sequence of reward vectors in the dual space of X are given, and it is shown that if both players play a Hannan-consistent strategy, then the empirical distributions of play weakly converge to the set of Nash equilibria of the game.

Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback

The multi-point bandit setting, in which the player can query each loss function at multiple points, is introduced, and regret bounds that closely resemble bounds for the full information case are proved.