93 Citations
Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments
- Computer ScienceCOLT
- 2011
A computationally efficient learning algorithm is provided that achieves the minimax regret within logarithmic factor for any game with finitely many actions and outcomes.
A near-optimal algorithm for finite partial-monitoring games against adversarial opponents
- Computer ScienceCOLT
- 2013
A new algorithm for locally observable partial monitoring games is presented and it is proved that the expected regret of the algorithm is of O( √ N ′T ), where T is the time horizon and N is the size of the largest point-local game.
Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications
- Computer ScienceICML
- 2014
The Global Confidence Bound (GCB) algorithm is presented, which integrates ideas from both combinatorial multi-armed bandits and finite partial monitoring games to handle all the above issues and can be applied to a wide range of combinatorsial applications constrained with limited feedback.
Exploiting Side Information in Partial Monitoring Games
- Computer Science
- 2013
An instantiation of the CBP-SIDE algorithm that uses linear least squares estimators to determine the best possible action given the side information is implemented, using a leading contextual bandit algorithm as a baseline and compared to a variety of problem instances.
The Role of Information in Online Learning
- Computer Science
- 2012
This thesis investigates to what extent the information received influences the best achievable cumulative loss suffered by an optimal player, and presents algorithms that have theoretical guarantees for achieving low cumulative loss, and proves their optimality by providing matching, algorithm independent lower bounds.
Exploiting Side Information in Partial Monitoring Games An Empirical Study of the CBP-SIDE Algorithm with Applications to Procurement
- Computer Science
- 2013
An instantiation of the CBP-SIDE algorithm that uses linear least squares estimators to determine the best possible action given the side information is implemented, using a leading contextual bandit algorithm as a baseline and compared to a variety of problem instances.
Non-trivial two-armed partial-monitoring games are bandits
- Computer Science, MathematicsArXiv
- 2011
It is shown that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is $\Theta(\sqrt{T})$.
Information Directed Sampling for Linear Partial Monitoring
- Computer ScienceCOLT
- 2020
This work introduces information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure and proves lower bounds that classify the minimax regret of all finite games into four possible regimes.
An Information-Theoretic Approach to Minimax Regret in Partial Monitoring
- Computer Science, MathematicsCOLT
- 2019
We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then…
No Internal Regret via Neighborhood Watch
- Computer ScienceAISTATS
- 2012
An algorithm is presented which attains O(\sqrt{T}) internal (and thus external) regret for finite games with partial monitoring under the local observability condition, and regret guarantees also hold for the more general model of partial monitoring with random signals.
References
SHOWING 1-10 OF 40 REFERENCES
Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments
- Computer ScienceCOLT
- 2011
A computationally efficient learning algorithm is provided that achieves the minimax regret within logarithmic factor for any game with finitely many actions and outcomes.
Regret Minimization Under Partial Monitoring
- Economics, Computer Science2006 IEEE Information Theory Workshop - ITW '06 Punta del Este
- 2006
A general lower bound for the convergence rate of the regret is proved, and a specific strategy that attains this rate for any game for which a Hannan consistent player exists is exhibited.
Non-trivial two-armed partial-monitoring games are bandits
- Computer Science, MathematicsArXiv
- 2011
It is shown that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is $\Theta(\sqrt{T})$.
Discrete Prediction Games with Arbitrary Feedback and Loss
- Computer ScienceCOLT/EuroCOLT
- 2001
This work investigates the problem of predicting a sequence when the information about the previous elements (feedback) is onlypartial and possibly dependent on the predicted values, and evaluates the performance against the best constant predictor (regret) as it is common in iterated game analysis.
No Internal Regret via Neighborhood Watch
- Computer ScienceAISTATS
- 2012
An algorithm is presented which attains O(\sqrt{T}) internal (and thus external) regret for finite games with partial monitoring under the local observability condition, and regret guarantees also hold for the more general model of partial monitoring with random signals.
Online Optimization in X-Armed Bandits
- Computer Science, MathematicsNIPS
- 2008
The results imply that if Χ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Holder with a known exponent, then the expected regret is bounded up to a logarithmic factor by √n.
The Nonstochastic Multiarmed Bandit Problem
- Computer Science, EconomicsSIAM J. Comput.
- 2002
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.
Strategies for Prediction Under Imperfect Monitoring
- Computer ScienceMath. Oper. Res.
- 2008
The forecasters presented here offer the first constructive proof of consistency and their algorithms are computationally efficient and optimal up to logarithmic terms.
Multi-armed bandits in metric spaces
- Computer Science, MathematicsSTOC
- 2008
This work defines an isometry invariant Max Min COV(X) which bounds from below the performance of Lipschitz MAB algorithms for X, and presents an algorithm which comes arbitrarily close to meeting this bound.
Minimizing regret with label efficient prediction
- Computer ScienceIEEE Transactions on Information Theory
- 2005
It is proved that Hannan consistency, a fundamental property in game-theoretic prediction models, can be achieved by a forecaster issuing a number of queries growing to infinity at a rate just slightly faster than logarithmic in the number of prediction rounds.