Corpus ID: 2067244

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences

  title={A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences},
  author={O. Maillard and R. Munos and G. Stoltz},
  • O. Maillard, R. Munos, G. Stoltz
  • Published in COLT 2011
  • Mathematics, Computer Science
  • We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of \cite{Burnetas96}. Our contribution is to provide a finite-time analysis of this algorithm; we get bounds whose main terms are smaller than the ones of previously known algorithms with finite-time analyses (like UCB-type algorithms). 
    98 Citations

    Topics from this paper

    A stochastic multi-armed bandit approach to nonparametric H∞-norm estimation
    • 7
    Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model
    • 3
    • PDF
    Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis
    • 389
    • PDF
    Sub-sampling for Multi-armed Bandits
    • 21
    • PDF
    Thompson Sampling: An Optimal Finite Time Analysis
    • 25
    Robust Risk-Averse Stochastic Multi-armed Bandits
    • 30
    • PDF


    UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
    • 203
    • PDF
    An Asymptotically Optimal Bandit Algorithm for Bounded Support Models
    • 106
    • Highly Influential
    • PDF
    An asymptotically optimal policy for finite support models in the multiarmed bandit problem
    • 53
    • Highly Influential
    • PDF
    The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond
    • 401
    • PDF
    Finite-time Analysis of the Multiarmed Bandit Problem
    • 4,416
    • PDF
    Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    • 412
    • PDF
    Regret Bounds and Minimax Policies under Partial Monitoring
    • 144
    • PDF
    Context tree selection: A unifying view
    • 31
    • PDF
    Optimal Adaptive Policies for Sequential Allocation Problems
    • 164
    • Highly Influential
    • PDF
    Probability Theory I
    • 6,167
    • PDF