Thompson Sampling for Unimodal Bandits
@article{Yang2021ThompsonSF, title={Thompson Sampling for Unimodal Bandits}, author={Long Yang and Zhao Li and Zehong Hu and Shasha Ruan and Shijian Li and Gang Pan and Hongyang Chen}, journal={ArXiv}, year={2021}, volume={abs/2106.08187} }
In this paper, we propose a Thompson Sampling algorithm for unimodal bandits, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, our algorithm makes decision according to posterior distribution only in the neighborhood of the arm that has the highest empirical mean estimate. We theoretically prove that, for Bernoulli rewards, the regret of our algorithm reaches the…
Figures and Tables from this paper
References
SHOWING 1-10 OF 34 REFERENCES
Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
- Computer ScienceICML
- 2014
It is shown that combining an appropriate discretization of the set of arms with the UCB algorithm yields an order-optimal regret, and in practice, outperforms recently proposed algorithms designed to exploit the unimodal structure.
Unimodal Bandits
- Computer ScienceICML
- 2011
We consider multiarmed bandit problems where the expected reward is unimodal over partially ordered arms. In particular, the arms may belong to a continuous interval or correspond to vertices in a…
Unimodal Thompson Sampling for Graph-Structured Arms
- Computer ScienceAAAI
- 2017
A Thompson Sampling-based algorithm whose asymptotic pseudo-regret matches the lower bound for the considered setting and it is shown that Bayesian MAB algorithms dramatically outperform frequentist ones.
Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling
- Computer ScienceALT
- 2020
An asymptotically optimal regret bound is proved on the frequentist regret of UTS and simulations showing the significant improvement of the method compared to the state-of-the-art are supported.
Thompson Sampling Algorithms for Mean-Variance Bandits
- Computer ScienceICML
- 2020
Thompson Sampling-style algorithms for mean-variance MAB and comprehensive regret analyses for Gaussian and Bernoulli bandits with fewer assumptions are developed and shown to significantly outperform existing LCB-based algorithms for all risk tolerances.
Further Optimal Regret Bounds for Thompson Sampling
- Computer ScienceAISTATS
- 2013
A novel regret analysis for Thompson Sampling is provided that proves the first near-optimal problem-independent bound of O( √ NT lnT ) on the expected regret of this algorithm, and simultaneously provides the optimal problem-dependent bound.
Learning to Optimize via Posterior Sampling
- Computer ScienceMath. Oper. Res.
- 2014
A Bayesian regret bound for posterior sampling is made that applies broadly and can be specialized to many model classes and depends on a new notion the authors refer to as the eluder dimension, which measures the degree of dependence among action rewards.
Kullback–Leibler upper confidence bounds for optimal sequential allocation
- Computer Science
- 2013
The main contribution is a unified finite-time analysis of the regret of these algorithms that asymptotically matches the lower bounds of Lai and Robbins (1985) and Burnetas and Katehakis (1996), respectively.
Thompson Sampling for Complex Online Problems
- Computer ScienceICML
- 2014
It is proved a frequentist regret bound for Thompson sampling in a very general setting involving parameter, action and observation spaces and a likelihood function over them, and improved regret bounds are derived for classes of complex bandit problems involving selecting subsets of arms, including the first nontrivial regret bounds for nonlinear reward feedback from subsets.
Finite-time Analysis of the Multiarmed Bandit Problem
- Computer ScienceMachine Learning
- 2004
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.