Sub-sampling for Multi-armed Bandits

  title={Sub-sampling for Multi-armed Bandits},
  author={Akram Baransi and Odalric-Ambrym Maillard and Shie Mannor},
The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation trade-off in sequential decision problems. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against stateof-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know a set of reward distributions in advance nor the range of the rewards. It… CONTINUE READING


Publications referenced by this paper.
Showing 1-10 of 21 references

Using Confidence Bounds for Exploitation-Exploration Trade-offs

Journal of Machine Learning Research • 2002
View 4 Excerpts
Highly Influenced

On the uniform asymptotic validity of subsampling and the bootstrap

Joseph P. Romano, Azeem M. Shaikh
The Annals of Statistics, 40(6):2798–2822, • 2012
View 1 Excerpt

Similar Papers

Loading similar papers…