• Corpus ID: 219721467

Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring

  title={Analysis and Design of Thompson Sampling for Stochastic Partial Monitoring},
  author={Taira Tsuchiya and Junya Honda and Masashi Sugiyama},
We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback. While Thompson sampling is one of the most promising algorithms on a variety of online decision-making problems, its properties for stochastic partial monitoring have not been theoretically investigated, and the existing algorithm relies on a heuristic approximation of the posterior distribution. To mitigate these problems, we present a novel Thompson-sampling-based… 

Figures and Tables from this paper


Efficient Partial Monitoring with Prior Information
BPM is proposed, a family of new efficient algorithms whose core is to track the outcome distribution with an ellipsoid centered around the estimated distribution, and it is shown that the algorithm provably enjoys near-optimal regret rate for locally observable partial-monitoring problems against stochastic opponents.
Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments
A computationally efficient learning algorithm is provided that achieves the minimax regret within logarithmic factor for any game with finitely many actions and outcomes.
Discrete Prediction Games with Arbitrary Feedback and Loss
This work investigates the problem of predicting a sequence when the information about the previous elements (feedback) is onlypartial and possibly dependent on the predicted values, and evaluates the performance against the best constant predictor (regret) as it is common in iterated game analysis.
Information Directed Sampling for Linear Partial Monitoring
This work introduces information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure and proves lower bounds that classify the minimax regret of all finite games into four possible regimes.
Sampling from a multivariate Gaussian distribution truncated on a simplex: A review
This paper reviews recent Monte Carlo methods for sampling from multivariate Gaussian distributions restricted to the standard simplex and describes and analyzes two Hamiltonian Monte Carlo Methods.
Further Optimal Regret Bounds for Thompson Sampling
A novel regret analysis for Thompson Sampling is provided that proves the first near-optimal problem-independent bound of O( √ NT lnT ) on the expected regret of this algorithm, and simultaneously provides the optimal problem-dependent bound.
Thompson Sampling for Contextual Bandits with Linear Payoffs
A generalization of Thompson Sampling algorithm for the stochastic contextual multi-armed bandit problem with linear payoff functions, when the contexts are provided by an adaptive adversary is designed and analyzed.
Asymptotically efficient adaptive allocation rules
Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring
This paper derives a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem and derives an asymptotically optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.