• Publications
  • Influence
Kullback–Leibler upper confidence bounds for optimal sequential allocation
We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upperExpand
  • 230
  • 53
  • PDF
Concentration inequalities for sampling without replacement
Concentration inequalities quantify the deviation of a random variable from a fixed value. In spite of numerous applications, such as opinion surveys or ecological counting procedures , fewExpand
  • 90
  • 13
  • PDF
A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences
TLDR
We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas96. Expand
  • 98
  • 10
  • PDF
LSTD with Random Projections
TLDR
We study the least-squares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection from a high-dimensional space. Expand
  • 49
  • 10
  • PDF
Compressed Least-Squares Regression
TLDR
We consider the problem of learning, from K data, a regression function in a linear space of high dimension N using projections onto a random subspace of lower dimension M. From any algorithm minimizing the (possibly penalized) empirical risk, we provide bounds on the excess risk of the estimate computed in the projected subspace (compressed domain). Expand
  • 102
  • 9
  • PDF
Latent Bandits
TLDR
We consider a multi-armed bandit problem where the reward distributions are indexed by two sets -one for arms, one for type- and can be partitioned into a small number of clusters according to the type. Expand
  • 46
  • 3
  • PDF
The non-stationary stochastic multi-armed bandit problem
TLDR
We consider a variant of the stochastic multi-armed bandit with K arms where the rewards are not assumed to be identically distributed, but are generated by a non-stationary process. Expand
  • 28
  • 3
  • PDF
Robust Risk-Averse Stochastic Multi-armed Bandits
TLDR
We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in maximizing some coherent risk measure criterion. Expand
  • 30
  • 3
Streaming kernel regression with provably adaptive mean, variance, and regularization
TLDR
We tackle the problem of tuning the regularization parameter adaptively at each time step, while maintaining tight confidence bounds estimates on the value of the mean function at each point. Expand
  • 17
  • 3
  • PDF
Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds
TLDR
We introduce a novel tuning of the GLR test that takes here a simple form involving scan statistics, based on an extension of the Laplace method for scan-statistics that holds doubly-uniformly in time. Expand
  • 8
  • 3
  • PDF