Finite-time Analysis of the Multiarmed Bandit Problem
@article{Auer2004FinitetimeAO, title={Finite-time Analysis of the Multiarmed Bandit Problem}, author={P. Auer and Nicol{\`o} Cesa-Bianchi and P. Fischer}, journal={Machine Learning}, year={2004}, volume={47}, pages={235-256} }
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi… CONTINUE READING
Figures and Topics from this paper
4,416 Citations
An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- Mathematics, Computer Science
- Machine Learning
- 2011
- 53
- PDF
Risk-Averse Explore-Then-Commit Algorithms for Finite-Time Bandits
- Computer Science, Mathematics
- 2019 IEEE 58th Conference on Decision and Control (CDC)
- 2019
- 5
- Highly Influenced
- PDF
A Structured Multiarmed Bandit Problem and the Greedy Policy
- Computer Science
- IEEE Transactions on Automatic Control
- 2009
- 44
- Highly Influenced
- PDF
Multi-armed bandit problems with heavy-tailed reward distributions
- Computer Science, Mathematics
- 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2011
- 21
- Highly Influenced
- PDF
Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards
- Computer Science, Mathematics
- ArXiv
- 2014
- 66
- Highly Influenced
- PDF
A structured multiarmed bandit problem and the greedy policy
- Computer Science, Mathematics
- 2008 47th IEEE Conference on Decision and Control
- 2008
- 48
- PDF
References
SHOWING 1-10 OF 30 REFERENCES
Gambling in a rigged casino: The adversarial multi-armed bandit problem
- Mathematics, Computer Science
- Proceedings of IEEE 36th Annual Foundations of Computer Science
- 1995
- 738
- PDF
SAMPLE MEAN BASED INDEX POLICIES WITH O(logn) REGRET FOR THE MULTI-ARMED BANDIT PROBLEM
- Mathematics
- 1995
- 470
Reinforcement Learning: An Introduction
- Computer Science
- IEEE Transactions on Neural Networks
- 2005
- 27,472
- PDF