We consider optimal sequential allocation in the context of the so-called stochastic multi-armed bandit model. We describe a generic index policy, in the sense of Gittins (1979), based on upper… Expand

Concentration inequalities quantify the deviation of a random variable from a fixed value. In spite of numerous applications, such as opinion surveys or ecological counting procedures , few… Expand

We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas96.Expand

We study the least-squares temporal difference (LSTD) learning algorithm when a space of low dimension is generated with a random projection from a high-dimensional space.Expand

We consider the problem of learning, from K data, a regression function in a linear space of high dimension N using projections onto a random subspace of lower dimension M. From any algorithm minimizing the (possibly penalized) empirical risk, we provide bounds on the excess risk of the estimate computed in the projected subspace (compressed domain).Expand

We consider a multi-armed bandit problem where the reward distributions are indexed by two sets -one for arms, one for type- and can be partitioned into a small number of clusters according to the type.Expand

We consider a variant of the stochastic multi-armed bandit with K arms where the rewards are not assumed to be identically distributed, but are generated by a non-stationary process.Expand

We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in maximizing some coherent risk measure criterion.Expand

We tackle the problem of tuning the regularization parameter adaptively at each time step, while maintaining tight confidence bounds estimates on the value of the mean function at each point.Expand

We introduce a novel tuning of the GLR test that takes here a simple form involving scan statistics, based on an extension of the Laplace method for scan-statistics that holds doubly-uniformly in time.Expand