• Corpus ID: 219721467

Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

@article{Tsuchiya2020AnalysisAD,
  title={Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring},
  author={Taira Tsuchiya and Junya Honda and Masashi Sugiyama},
  journal={ArXiv},
  year={2020},
  volume={abs/2006.09668}
}
We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback. While Thompson sampling is one of the most promising algorithms on a variety of online decision-making problems, its properties for stochastic partial monitoring have not been theoretically investigated, and the existing algorithm relies on a heuristic approximation of the posterior distribution. To mitigate these problems, we present a novel Thompson-sampling-based… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 40 REFERENCES
Efficient Partial Monitoring with Prior Information
TLDR
BPM is proposed, a family of new efficient algorithms whose core is to track the outcome distribution with an ellipsoid centered around the estimated distribution, and it is shown that the algorithm provably enjoys near-optimal regret rate for locally observable partial-monitoring problems against stochastic opponents.
Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments
TLDR
A computationally efficient learning algorithm is provided that achieves the minimax regret within logarithmic factor for any game with finitely many actions and outcomes.
Discrete Prediction Games with Arbitrary Feedback and Loss
TLDR
This work investigates the problem of predicting a sequence when the information about the previous elements (feedback) is onlypartial and possibly dependent on the predicted values, and evaluates the performance against the best constant predictor (regret) as it is common in iterated game analysis.
Information Directed Sampling for Linear Partial Monitoring
TLDR
This work introduces information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure and proves lower bounds that classify the minimax regret of all finite games into four possible regimes.
Further Optimal Regret Bounds for Thompson Sampling
TLDR
A novel regret analysis for Thompson Sampling is provided that proves the first near-optimal problem-independent bound of O( √ NT lnT ) on the expected regret of this algorithm, and simultaneously provides the optimal problem-dependent bound.
Thompson Sampling for Contextual Bandits with Linear Payoffs
TLDR
A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.
Asymptotically Efficient Adaptive Allocation Rules
Asymptotically efficient adaptive allocation rules
Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring
TLDR
This paper derives a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem and derives an asymptotically optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.
An Information-Theoretic Approach to Minimax Regret in Partial Monitoring
We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then
...
1
2
3
4
...