# Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

@article{Tsuchiya2020AnalysisAD, title={Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring}, author={Taira Tsuchiya and Junya Honda and Masashi Sugiyama}, journal={ArXiv}, year={2020}, volume={abs/2006.09668} }

We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback. While Thompson sampling is one of the most promising algorithms on a variety of online decision-making problems, its properties for stochastic partial monitoring have not been theoretically investigated, and the existing algorithm relies on a heuristic approximation of the posterior distribution. To mitigate these problems, we present a novel Thompson-sampling-based…

## References

SHOWING 1-10 OF 40 REFERENCES

Efficient Partial Monitoring with Prior Information

- Computer ScienceNIPS
- 2014

BPM is proposed, a family of new efficient algorithms whose core is to track the outcome distribution with an ellipsoid centered around the estimated distribution, and it is shown that the algorithm provably enjoys near-optimal regret rate for locally observable partial-monitoring problems against stochastic opponents.

Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments

- Mathematics, Computer ScienceCOLT
- 2011

A computationally efficient learning algorithm is provided that achieves the minimax regret within logarithmic factor for any game with finitely many actions and outcomes.

Discrete Prediction Games with Arbitrary Feedback and Loss

- Mathematics, Computer ScienceCOLT/EuroCOLT
- 2001

This work investigates the problem of predicting a sequence when the information about the previous elements (feedback) is onlypartial and possibly dependent on the predicted values, and evaluates the performance against the best constant predictor (regret) as it is common in iterated game analysis.

Information Directed Sampling for Linear Partial Monitoring

- Computer Science, MathematicsCOLT
- 2020

This work introduces information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure and proves lower bounds that classify the minimax regret of all finite games into four possible regimes.

Further Optimal Regret Bounds for Thompson Sampling

- Mathematics, Computer ScienceAISTATS
- 2013

A novel regret analysis for Thompson Sampling is provided that proves the first near-optimal problem-independent bound of O( √ NT lnT ) on the expected regret of this algorithm, and simultaneously provides the optimal problem-dependent bound.

Thompson Sampling for Contextual Bandits with Linear Payoffs

- Computer Science, MathematicsICML
- 2013

A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

- Computer Science, MathematicsNIPS
- 2015

This paper derives a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem and derives an asymptotically optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

- Computer Science, MathematicsCOLT
- 2019

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then…