Finite-time Analysis of the Multiarmed Bandit Problem

@article{Auer2004FinitetimeAO,
  title={Finite-time Analysis of the Multiarmed Bandit Problem},
  author={P. Auer and Nicol{\`o} Cesa-Bianchi and P. Fischer},
  journal={Machine Learning},
  year={2004},
  volume={47},
  pages={235-256}
}
  • P. Auer, Nicolò Cesa-Bianchi, P. Fischer
  • Published 2004
  • Computer Science
  • Machine Learning
  • Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi… CONTINUE READING
    4,416 Citations

    Figures and Topics from this paper

    An asymptotically optimal policy for finite support models in the multiarmed bandit problem
    • 53
    • PDF
    Lenient Regret for Multi-Armed Bandits
    • 1
    • PDF
    Risk-Averse Explore-Then-Commit Algorithms for Finite-Time Bandits
    • 5
    • Highly Influenced
    • PDF
    Pure Exploration in Multi-armed Bandits Problems
    • 289
    • PDF
    A Structured Multiarmed Bandit Problem and the Greedy Policy
    • 44
    • Highly Influenced
    • PDF
    On the evolution of the expected gain of a greedy action in the bandit problem
    • 2
    • PDF
    Multi-armed bandit problems with heavy-tailed reward distributions
    • K. Liu, Qing Zhao
    • Computer Science, Mathematics
    • 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
    • 2011
    • 21
    • Highly Influenced
    • PDF
    Pure Exploration for Multi-Armed Bandit Problems
    • 26
    • PDF
    Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards
    • 66
    • Highly Influenced
    • PDF
    A structured multiarmed bandit problem and the greedy policy
    • 48
    • PDF

    References

    SHOWING 1-10 OF 30 REFERENCES
    Gambling in a rigged casino: The adversarial multi-armed bandit problem
    • 738
    • PDF
    SAMPLE MEAN BASED INDEX POLICIES WITH O(logn) REGRET FOR THE MULTI-ARMED BANDIT PROBLEM
    • 470
    Q-Learning for Bandit Problems
    • 25
    Multi-Armed bandit problem revisited
    • 40
    Nonparametric bandit methods
    • 33
    Reinforcement Learning: An Introduction
    • 27,472
    • PDF
    Reinforcement Learning: A Survey
    • 6,658
    • PDF
    Learning in embedded systems
    • 731
    Learning to Act Using Real-Time Dynamic Programming
    • 1,264
    • PDF