Toward a classification of finite partial-monitoring games

@inproceedings{Bartk2010TowardAC,
  title={Toward a classification of finite partial-monitoring games},
  author={G{\'a}bor Bart{\'o}k and D{\'a}vid P{\'a}l and Csaba Szepesvari},
  booktitle={Theoretical Computer Science},
  year={2010}
}

Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments

A computationally efficient learning algorithm is provided that achieves the minimax regret within logarithmic factor for any game with finitely many actions and outcomes.

A near-optimal algorithm for finite partial-monitoring games against adversarial opponents

A new algorithm for locally observable partial monitoring games is presented and it is proved that the expected regret of the algorithm is of O( √ N ′T ), where T is the time horizon and N is the size of the largest point-local game.

Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications

The Global Confidence Bound (GCB) algorithm is presented, which integrates ideas from both combinatorial multi-armed bandits and finite partial monitoring games to handle all the above issues and can be applied to a wide range of combinatorsial applications constrained with limited feedback.

Exploiting Side Information in Partial Monitoring Games

An instantiation of the CBP-SIDE algorithm that uses linear least squares estimators to determine the best possible action given the side information is implemented, using a leading contextual bandit algorithm as a baseline and compared to a variety of problem instances.

The Role of Information in Online Learning

This thesis investigates to what extent the information received influences the best achievable cumulative loss suffered by an optimal player, and presents algorithms that have theoretical guarantees for achieving low cumulative loss, and proves their optimality by providing matching, algorithm independent lower bounds.

Exploiting Side Information in Partial Monitoring Games An Empirical Study of the CBP-SIDE Algorithm with Applications to Procurement

  • Computer Science
  • 2013
An instantiation of the CBP-SIDE algorithm that uses linear least squares estimators to determine the best possible action given the side information is implemented, using a leading contextual bandit algorithm as a baseline and compared to a variety of problem instances.

Non-trivial two-armed partial-monitoring games are bandits

It is shown that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is $\Theta(\sqrt{T})$.

Information Directed Sampling for Linear Partial Monitoring

This work introduces information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure and proves lower bounds that classify the minimax regret of all finite games into four possible regimes.

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then

No Internal Regret via Neighborhood Watch

An algorithm is presented which attains O(\sqrt{T}) internal (and thus external) regret for finite games with partial monitoring under the local observability condition, and regret guarantees also hold for the more general model of partial monitoring with random signals.
...

References

SHOWING 1-10 OF 40 REFERENCES

Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments

A computationally efficient learning algorithm is provided that achieves the minimax regret within logarithmic factor for any game with finitely many actions and outcomes.

Regret Minimization Under Partial Monitoring

A general lower bound for the convergence rate of the regret is proved, and a specific strategy that attains this rate for any game for which a Hannan consistent player exists is exhibited.

Non-trivial two-armed partial-monitoring games are bandits

It is shown that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is $\Theta(\sqrt{T})$.

Discrete Prediction Games with Arbitrary Feedback and Loss

This work investigates the problem of predicting a sequence when the information about the previous elements (feedback) is onlypartial and possibly dependent on the predicted values, and evaluates the performance against the best constant predictor (regret) as it is common in iterated game analysis.

No Internal Regret via Neighborhood Watch

An algorithm is presented which attains O(\sqrt{T}) internal (and thus external) regret for finite games with partial monitoring under the local observability condition, and regret guarantees also hold for the more general model of partial monitoring with random signals.

Online Optimization in X-Armed Bandits

The results imply that if Χ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Holder with a known exponent, then the expected regret is bounded up to a logarithmic factor by √n.

The Nonstochastic Multiarmed Bandit Problem

A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs.

Strategies for Prediction Under Imperfect Monitoring

The forecasters presented here offer the first constructive proof of consistency and their algorithms are computationally efficient and optimal up to logarithmic terms.

Multi-armed bandits in metric spaces

This work defines an isometry invariant Max Min COV(X) which bounds from below the performance of Lipschitz MAB algorithms for X, and presents an algorithm which comes arbitrarily close to meeting this bound.

Minimizing regret with label efficient prediction

It is proved that Hannan consistency, a fundamental property in game-theoretic prediction models, can be achieved by a forecaster issuing a number of queries growing to infinity at a rate just slightly faster than logarithmic in the number of prediction rounds.