# Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

@article{Jin2022FiniteTimeRO, title={Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits}, author={Tianyuan Jin and Pan Xu and X. Xiao and Anima Anandkumar}, journal={ArXiv}, year={2022}, volume={abs/2206.03520} }

We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponential, etc. We propose a Thompson sampling algorithm, termed ExpTS, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm. We provide a tight regret analysis for ExpTS, which simultaneously yields both the…

## References

SHOWING 1-10 OF 35 REFERENCES

### Further Optimal Regret Bounds for Thompson Sampling

- Computer ScienceAISTATS
- 2013

A novel regret analysis for Thompson Sampling is provided that proves the first near-optimal problem-independent bound of O( √ NT lnT ) on the expected regret of this algorithm, and simultaneously provides the optimal problem-dependent bound.

### Thompson Sampling for 1-Dimensional Exponential Family Bandits

- Computer Science, MathematicsNIPS
- 2013

This work proves asymptotic optimality of theThompson Sampling algorithm using the Jeffreys prior using closed forms for Kullback-Leibler divergence and Fisher information available in an exponential family, to give a finite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right.

### Prior-free and prior-dependent regret bounds for Thompson Sampling

- Mathematics, Computer Science2014 48th Annual Conference on Information Sciences and Systems (CISS)
- 2014

It is shown that Thompson Sampling attains an optimal prior-free bound in the sense that for any prior distribution its Bayesian regret is bounded from above by 14√nK, and that in the case of priors for the setting of Bubeck et al.

### Thompson Sampling for Combinatorial Semi-Bandits

- Computer ScienceICML
- 2018

The first distribution-dependent regret bound of O(mK_{\max}\log T / \Delta_{\min}) is obtained, and it is shown that one cannot directly replace the exact offline oracle with an approximation oracle in TS algorithm for even the classical MAB problem.

### Linear Thompson Sampling Revisited

- Computer ScienceAISTATS
- 2017

Thompson sampling can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach.

### MOTS: Minimax Optimal Thompson Sampling

- Computer ScienceICML
- 2021

MOTS is the first Thompson sampling type algorithm that achieves minimax optimality for multi-armed bandit problems by proposing a variant of Thompson sampling called MOTS that adaptively clips the sampling result of the chosen arm at each time step.

### Doubly Robust Thompson Sampling for linear payoffs

- Computer ScienceNeurIPS
- 2021

A novel multi-armed contextual bandit algorithm employing the doubly-robust estimator used in missing data literature to Thompson Sampling with contexts ( LinTS) and improving the bound of LinTS by a factor of √ d is proposed.

### KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints

- Computer ScienceArXiv
- 2018

This self-contained contribution simultaneously presents state-of-the-art techniques for regret minimization in bandit models, and an elementary construction of non-asymptotic confidence bounds based on the empirical likelihood method for bounded distributions.

### Finite-time Analysis of the Multiarmed Bandit Problem

- Computer ScienceMachine Learning
- 2004

This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

### The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond

- Computer ScienceCOLT
- 2011

It is proved that for arbitrary bounded rewards, the KL-UCB algorithm satisfies a uniformly better regret bound than UCB or UCB2; second, in the special case of Bernoulli rewards, it reaches the lower bound of Lai and Robbins.