The Sample Complexity of Exploration in the Multi-Armed Bandit Problem

  title={The Sample Complexity of Exploration in the Multi-Armed Bandit Problem},
  author={Shie Mannor and John N. Tsitsiklis},
  booktitle={Journal of Machine Learning Research},
We consider the multi-armed bandit problem under the PAC (“probably approximately correct”) model. It was shown by Even-Dar et al. (2002) that given n arms, a total of O ( (n/ε2) log(1/δ) ) trials suffices in order to find an ε-optimal arm with probability at least 1 − δ. We establish a matching lower bound on the expected number of trials under any sampling policy. We furthermore generalize the lower bound, and show an explicit dependence on the (unknown) statistics of the arms. We also… CONTINUE READING
Highly Influential
This paper has highly influenced 30 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 235 citations. REVIEW CITATIONS
147 Citations
14 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 147 extracted citations

236 Citations

Citations per Year
Semantic Scholar estimates that this publication has 236 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 14 references

Sequential analysis: Tests and Confidence Intervals

  • D. Siegmund
  • 1985
Highly Influential
4 Excerpts

Asymptotically optimal procedures for sequential adaptive selection of the best of several normal means

  • C. Jennison, I. M. Johnstone, B. W. Turnbull
  • Statistical decision theory and related topics…
  • 1982
Highly Influential
4 Excerpts

Lower bounds on the sample complexity of exploration in the multiarmed bandit problem

  • S. Mannor, J. N. Tsitsiklis
  • Sixteenth Annual Conference on Computational…
  • 2003
1 Excerpt

A dynamic allocation index for the sequential design of experiments

  • J. Gani, K. Sarkadi, I. Vincze
  • 2002

Similar Papers

Loading similar papers…