We revisit lower bounds on the regret in the case of multi-armed bandit problems. We obtain nonasymptotic bounds and provide straightforward proofs based only on well-known properties of… (More)

We propose the kl-UCB algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense… (More)

We extend Fano’s inequality, which controls the average probability of (disjoint) events in terms of the average of some Kullback-Leibler divergences, to work with arbitrary [0, 1]–valued random… (More)

In the context of K–armed stochastic bandits with distribution only assumed to be supported by [0, 1], we introduce a new algorithm, KL-UCB-switch, and prove that is enjoys simultaneously a… (More)