# Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

@article{Jin2022FiniteTimeRO,
title={Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits},
author={Tianyuan Jin and Pan Xu and X. Xiao and Anima Anandkumar},
journal={ArXiv},
year={2022},
volume={abs/2206.03520}
}
• Published 7 June 2022
• Computer Science
• ArXiv
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponential, etc. We propose a Thompson sampling algorithm, termed ExpTS, which uses a novel sampling distribution to avoid the under-estimation of the optimal arm. We provide a tight regret analysis for ExpTS, which simultaneously yields both the…

## References

SHOWING 1-10 OF 35 REFERENCES

• Computer Science
AISTATS
• 2013
A novel regret analysis for Thompson Sampling is provided that proves the first near-optimal problem-independent bound of O( √ NT lnT ) on the expected regret of this algorithm, and simultaneously provides the optimal problem-dependent bound.
• Computer Science, Mathematics
NIPS
• 2013
This work proves asymptotic optimality of theThompson Sampling algorithm using the Jeffreys prior using closed forms for Kullback-Leibler divergence and Fisher information available in an exponential family, to give a finite time exponential concentration inequality for posterior distributions on exponential families that may be of interest in its own right.
• Mathematics, Computer Science
2014 48th Annual Conference on Information Sciences and Systems (CISS)
• 2014
It is shown that Thompson Sampling attains an optimal prior-free bound in the sense that for any prior distribution its Bayesian regret is bounded from above by 14√nK, and that in the case of priors for the setting of Bubeck et al.
The first distribution-dependent regret bound of O(mK_{\max}\log T / \Delta_{\min}) is obtained, and it is shown that one cannot directly replace the exact offline oracle with an approximation oracle in TS algorithm for even the classical MAB problem.
• Computer Science
AISTATS
• 2017
Thompson sampling can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a UCB-like approach.
• Computer Science
ICML
• 2021
MOTS is the first Thompson sampling type algorithm that achieves minimax optimality for multi-armed bandit problems by proposing a variant of Thompson sampling called MOTS that adaptively clips the sampling result of the chosen arm at each time step.
• Computer Science
NeurIPS
• 2021
A novel multi-armed contextual bandit algorithm employing the doubly-robust estimator used in missing data literature to Thompson Sampling with contexts ( LinTS) and improving the bound of LinTS by a factor of √ d is proposed.
• Computer Science
ArXiv
• 2018
This self-contained contribution simultaneously presents state-of-the-art techniques for regret minimization in bandit models, and an elementary construction of non-asymptotic confidence bounds based on the empirical likelihood method for bounded distributions.
• Computer Science
Machine Learning
• 2004
This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
• Computer Science
COLT
• 2011
It is proved that for arbitrary bounded rewards, the KL-UCB algorithm satisfies a uniformly better regret bound than UCB or UCB2; second, in the special case of Bernoulli rewards, it reaches the lower bound of Lai and Robbins.