• Corpus ID: 250420821

# Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback

@inproceedings{Lin2021DoublyON,
title={Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback},
author={Tianyi Lin and Zhengyuan Zhou and Wenjia Ba and Jiawei Zhang},
year={2021}
}
• Published 6 December 2021
• Computer Science
We consider online no-regret learning in unknown games with bandit feedback, where each player can only observe its reward at each time – determined by all players’ current joint action – rather than its gradient. We focus on the class of smooth and strongly monotone games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of Θ̃(n √ T ) under…
1 Citations

## References

SHOWING 1-10 OF 81 REFERENCES

• Economics, Computer Science
NeurIPS
• 2018
This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games and derives an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.
• Computer Science
NeurIPS
• 2020
The optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the gap function in smooth monotone games.
• Economics, Mathematics
Math. Program.
• 2019
This paper focuses on learning via “dual averaging”, a widely used class of no-regret learning schemes where players take small steps along their individual payoff gradients and then “mirror” the output back to their action sets, and introduces the notion of variational stability.
• Computer Science
NeurIPS
• 2018
A simple variant of the classical online gradient descent algorithm, called reweightedOnline gradient descent (ROGD) is proposed and it is established that in variationally stable games, if each agent adopts ROGD, almost sure convergence to the set of Nash equilibria is guaranteed, even when the feedback loss is asynchronous and arbitrarily corrrelated among agents.
• Computer Science
2017 IEEE 56th Annual Conference on Decision and Control (CDC)
• 2017
An equilibrium stability notion called variational stability (VS) is introduced and it is shown that in variationally stable games, the last iterate of OMD converges to the set of Nash equilibria.
• Economics
Theoretical Economics
• 2020
This paper designs a stochastic learning process called the dampened gradient approximation process for games with continuous action sets and shows that despite such limited information, players will converge to Nash in large classes of games.
• Computer Science
ICML
• 2020
The first set of results that fill in several gaps of the existing multi-agent online learning literature, where three aspects--finite-time convergence rates, non-decreasing step-sizes, and fully adaptive algorithms have been unexplored before are provided.
• Computer Science
NIPS
• 2017
To tackle fully decentralized, asynchronous environments with (possibly) unbounded delays between actions and feedback, this work proposes a variant of OMD which it calls delayed mirror descent (DMD), and which relies on the repeated leveraging of past information.
• Mathematics, Computer Science
NIPS
• 2016
A general adversarial online learning problem, in which a decision set X' in a reflexive Banach space X and a sequence of reward vectors in the dual space of X are given, and it is shown that if both players play a Hannan-consistent strategy, then the empirical distributions of play weakly converge to the set of Nash equilibria of the game.
• Computer Science
COLT
• 2010
The multi-point bandit setting, in which the player can query each loss function at multiple points, is introduced, and regret bounds that closely resemble bounds for the full information case are proved.