Kullback–Leibler upper confidence bounds for optimal sequential allocation

@article{Capp2013KullbackLeiblerUC,
  title={Kullback–Leibler upper confidence bounds for optimal sequential allocation},
  author={O. Capp{\'e} and A. Garivier and O. Maillard and R{\'e}mi Munos and Gilles Stoltz},
  journal={Annals of Statistics},
  year={2013},
  volume={41},
  pages={1516-1541}
}
  • O. Cappé, A. Garivier, +2 authors Gilles Stoltz
  • Published 2013
  • Mathematics
  • Annals of Statistics
  • We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper confidence bounds of the arm payoffs computed using the Kullback-Leibler divergence. We consider two classes of distributions for which instances of this general idea are analyzed: The kl-UCB algorithm is designed for one-parameter exponential families and the empirical KL-UCB algorithm for bounded and… CONTINUE READING
    234 Citations

    Figures from this paper

    Paper Mentions

    Finite-time Analysis of Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards
    • 2
    • Highly Influenced
    On the notion of optimality in the stochastic multi-armed bandit problems
    • Highly Influenced
    On Bayesian index policies for sequential resource allocation
    • 36
    • Highly Influenced
    • PDF
    Asymptotically Optimal Algorithms for Multiple Play Bandits with Partial Feedback
    • 6
    • Highly Influenced
    • PDF
    Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms
    • 84
    • Highly Influenced
    • PDF
    KL-UCB-switch: optimal regret bounds for stochastic bandits from both a distribution-dependent and a distribution-free viewpoints
    • 14
    • Highly Influenced
    • PDF
    Asymptotically optimal algorithms for budgeted multiple play bandits
    • 1
    • PDF
    Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
    • Unimodal Bandits
    • 2013
    • 21
    • Highly Influenced
    • PDF
    Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
    • 55
    • Highly Influenced
    • PDF

    References

    SHOWING 1-10 OF 55 REFERENCES
    SAMPLE MEAN BASED INDEX POLICIES WITH O(logn) REGRET FOR THE MULTI-ARMED BANDIT PROBLEM
    • 469
    • Highly Influential
    The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond
    • 398
    • PDF
    An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
    • 106
    • Highly Influential
    • PDF
    Optimism in reinforcement learning and Kullback-Leibler divergence
    • 62
    • PDF
    Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    • 412
    • PDF
    Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model
    • 3
    • PDF
    On Bayesian Upper Confidence Bounds for Bandit Problems
    • 240
    • PDF
    An asymptotically optimal policy for finite support models in the multiarmed bandit problem
    • 53
    • PDF