# Toward a classification of finite partial-monitoring games

@inproceedings{Bartk2010TowardAC,
title={Toward a classification of finite partial-monitoring games},
author={G{\'a}bor Bart{\'o}k and D{\'a}vid P{\'a}l and Csaba Szepesvari},
booktitle={Theoretical Computer Science},
year={2010}
}
• Published in Theoretical Computer Science 6 October 2010
• Computer Science, Mathematics
• Computer Science
COLT
• 2011
A computationally efficient learning algorithm is provided that achieves the minimax regret within logarithmic factor for any game with finitely many actions and outcomes.
A new algorithm for locally observable partial monitoring games is presented and it is proved that the expected regret of the algorithm is of O( √ N ′T ), where T is the time horizon and N is the size of the largest point-local game.
• Tian LinWei Chen
• Computer Science
ICML
• 2014
The Global Confidence Bound (GCB) algorithm is presented, which integrates ideas from both combinatorial multi-armed bandits and finite partial monitoring games to handle all the above issues and can be applied to a wide range of combinatorsial applications constrained with limited feedback.
An instantiation of the CBP-SIDE algorithm that uses linear least squares estimators to determine the best possible action given the side information is implemented, using a leading contextual bandit algorithm as a baseline and compared to a variety of problem instances.
This thesis investigates to what extent the information received influences the best achievable cumulative loss suffered by an optimal player, and presents algorithms that have theoretical guarantees for achieving low cumulative loss, and proves their optimality by providing matching, algorithm independent lower bounds.
• Computer Science
• 2013
An instantiation of the CBP-SIDE algorithm that uses linear least squares estimators to determine the best possible action given the side information is implemented, using a leading contextual bandit algorithm as a baseline and compared to a variety of problem instances.
• Computer Science, Mathematics
ArXiv
• 2011
It is shown that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is $\Theta(\sqrt{T})$.
• Computer Science
COLT
• 2020
This work introduces information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure and proves lower bounds that classify the minimax regret of all finite games into four possible regimes.
• Computer Science, Mathematics
COLT
• 2019
We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then
• Computer Science
AISTATS
• 2012
An algorithm is presented which attains O(\sqrt{T}) internal (and thus external) regret for finite games with partial monitoring under the local observability condition, and regret guarantees also hold for the more general model of partial monitoring with random signals.

## References

SHOWING 1-10 OF 40 REFERENCES

• Computer Science
COLT
• 2011
A computationally efficient learning algorithm is provided that achieves the minimax regret within logarithmic factor for any game with finitely many actions and outcomes.
• Economics, Computer Science
2006 IEEE Information Theory Workshop - ITW '06 Punta del Este
• 2006
A general lower bound for the convergence rate of the regret is proved, and a specific strategy that attains this rate for any game for which a Hannan consistent player exists is exhibited.
• Computer Science, Mathematics
ArXiv
• 2011
It is shown that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is $\Theta(\sqrt{T})$.
• Computer Science
COLT/EuroCOLT
• 2001
This work investigates the problem of predicting a sequence when the information about the previous elements (feedback) is onlypartial and possibly dependent on the predicted values, and evaluates the performance against the best constant predictor (regret) as it is common in iterated game analysis.
• Computer Science
AISTATS
• 2012
An algorithm is presented which attains O(\sqrt{T}) internal (and thus external) regret for finite games with partial monitoring under the local observability condition, and regret guarantees also hold for the more general model of partial monitoring with random signals.
• Computer Science, Mathematics
NIPS
• 2008
The results imply that if Χ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Holder with a known exponent, then the expected regret is bounded up to a logarithmic factor by √n.
• Computer Science, Economics
SIAM J. Comput.
• 2002
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.
• Computer Science
Math. Oper. Res.
• 2008
The forecasters presented here offer the first constructive proof of consistency and their algorithms are computationally efficient and optimal up to logarithmic terms.
• Computer Science, Mathematics
STOC
• 2008
This work defines an isometry invariant Max Min COV(X) which bounds from below the performance of Lipschitz MAB algorithms for X, and presents an algorithm which comes arbitrarily close to meeting this bound.
• Computer Science
IEEE Transactions on Information Theory
• 2005
It is proved that Hannan consistency, a fundamental property in game-theoretic prediction models, can be achieved by a forecaster issuing a number of queries growing to infinity at a rate just slightly faster than logarithmic in the number of prediction rounds.