Learn More
The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the largest mean in a multi-armed bandit game in the fixed confidence setting using a small number of total samples. The procedure cannot be improved in the sense that the number of samples required to identify the best arm is within a constant factor of a lower(More)
This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such(More)
—Low-dimensional embedding based on non-metric data (e.g., non-metric multidimensional scaling) is a problem that arises in many applications, especially those involving human subjects. This paper investigates the problem of learning an embedding of n objects into d-dimensional Euclidean space that is consistent with pairwise comparisons of the type "(More)
—This paper is concerned with identifying the arm with the highest mean in a multi-armed bandit problem using as few independent samples from the arms as possible. While the so-called " best arm problem " dates back to the 1950s, only recently were two qualitatively different algorithms proposed that achieve the optimal sample complexity for the problem.(More)
Sampling from distributions to find the one with the largest mean arises in a broad range of applications , and it can be mathematically modeled as a multi-armed bandit problem in which each distribution is associated with an arm. This paper studies the sample complexity of identifying the best arm (largest mean) in a multi-armed bandit problem. Motivated(More)
Motivated by the task of hyperparameter optimization , we introduce the non-stochastic best-arm identification problem. Within the multi-armed bandit literature, the cumulative regret objective enjoys algorithms and analyses for both the non-stochastic and stochastic settings while to the best of our knowledge, the best-arm identification framework has only(More)
Active learning methods automatically adapt data collection by selecting the most informative samples in order to accelerate machine learning. Because of this, real-world testing and comparing active learning algorithms requires collecting new datasets (adaptively), rather than simply applying algorithms to benchmark datasets, as is the norm in (passive)(More)
The dueling bandit problem is a variation of the classical multi-armed bandit in which the allowable actions are noisy comparisons between pairs of arms. This paper focuses on a new approach for finding the " best " arm according to the Borda criterion using noisy comparisons. We prove that in the absence of structural assumptions, the sample complexity of(More)
This paper examines the problem of ranking a collection of objects using pair-wise comparisons (rankings of two objects). In a companion paper in the regular NIPS 2011 program [1], we showed that if each object x ∈ R d is assigned a score f (x) = ||x − r|| for some unknown r ∈ R d , then our recently proposed active ranking algorithm can recover the ranking(More)