• Publications
  • Influence
X-Armed Bandits
TLDR
We consider a generalization of stochastic bandits where the set of arms, X, is allowed to be a generic measurable space and the mean-payoff function is "locally Lipschitz" with respect to a dissimilarity function that is known to the decision maker. Expand
  • 247
  • 35
  • PDF
Pure exploration in finitely-armed and continuous-armed bandits
TLDR
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms. Expand
  • 174
  • 34
  • PDF
Pure Exploration in Multi-armed Bandits Problems
TLDR
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that perform an online exploration of the arms. Expand
  • 289
  • 29
  • PDF
A second-order bound with excess losses
TLDR
We study online aggregation of the predictions of experts, and first show new second-order regret bounds in the standard setting, which are obtained via a version of the Prod algorithm with multiple learning rates. Expand
  • 61
  • 19
  • PDF
Online Optimization in X-Armed Bandits
TLDR
We consider a generalization of stochastic bandit problems where the set of arms, Χ, is allowed to be a generic topological space in a way that is more general than Lipschitz. Expand
  • 184
  • 16
  • PDF
Explore First, Exploit Next: The True Shape of Regret in Bandit Problems
TLDR
We revisit lower bounds on the regret in the case of multi-armed bandit problems. Expand
  • 86
  • 15
  • PDF
Lipschitz Bandits without the Lipschitz Constant
TLDR
We consider the setting of stochastic bandit problems with a continuum of arms indexed by [0, 1]d. Expand
  • 60
  • 14
  • PDF
Improved Second-Order Bounds for Prediction with Expert Advice
TLDR
This work studies external regret in sequential prediction games with arbitrary payoffs (nonnegative or non-positive). Expand
  • 69
  • 13
A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences
TLDR
We consider a Kullback-Leibler-based algorithm for the stochastic multi-armed bandit problem in the case of distributions with finite supports (not necessarily known beforehand), whose asymptotic regret matches the lower bound of Burnetas96. Expand
  • 98
  • 10
  • PDF
Regret Minimization Under Partial Monitoring
TLDR
We consider repeated games in which the player, instead of observing the action chosen by the opponent in each game round, receives feedback generated by the combined choice of the two players. Expand
  • 66
  • 7
  • PDF