#### Filter Results:

#### Publication Year

2006

2016

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

Learn More

We introduce an efficient algorithm for the problem of online linear optimization in the bandit setting which achieves the optimal O * (√ T) regret. The setting is a natural generalization of the non-stochastic multi-armed bandit problem, and the existence of an efficient optimal algorithm has been posed as an open problem in a number of recent papers. We… (More)

Stochastic gradient descent (SGD) is a simple and popular method to solve stochas-tic optimization problems which arise in machine learning. For strongly convex problems , its convergence rate was known to be O(log(T)/T), by running SGD for T iterations and returning the average point. However , recent results showed that using a different algorithm, one… (More)

We study the rates of growth of the regret in online convex optimization. First, we show that a simple extension of the algorithm of Hazan et al eliminates the need for a priori knowledge of the lower bound on the second derivatives of the observed functions. We then provide an algorithm, Adaptive Online Gradient Descent, which interpolates between the… (More)

We present methods for online linear optimization that take advantage of benign (as opposed to worst-case) sequences. Specifically if the sequence encountered by the learner is described well by a known " predictable process " , the algorithms presented enjoy tighter bounds as compared to the typical worst case bounds. Additionally, the methods achieve the… (More)

We study the regret of optimal strategies for online convex optimization games. Using von Neumann's minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical minimization algorithm in a stochastic process setting: it is equal to the maximum, over joint distributions of the adversary's… (More)

— We provide a principled way of proving˜O(√ T) high-probability guarantees for partial-information (bandit) problems over arbitrary convex decision sets. First, we prove a regret guarantee for the full-information problem in terms of " local " norms, both for entropy and self-concordant barrier regularization, unifying these methods. Given one of such… (More)

We show a principled way of deriving online learning algorithms from a minimax analysis. Various upper bounds on the minimax value, previously thought to be non-constructive, are shown to yield algorithms. This allows us to seamlessly recover known methods and to derive new ones, also capturing such " unorthodox " methods as Follow the Perturbed Leader and… (More)

We phrase K-means clustering as an empirical risk minimization procedure over a class H K and explicitly calculate the covering number for this class. Next, we show that stability of K-means clustering is characterized by the geometry of H K with respect to the underlying distribution. We prove that in the case of a unique global minimizer, the clustering… (More)

We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O * (√ T) against an adap-tive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same… (More)

We provide several applications of Optimistic Mirror Descent, an online learning algorithm based on the idea of predictable sequences. First, we recover the Mirror Prox algorithm for offline optimization, prove an extension to Hölder-smooth functions, and apply the results to saddle-point type problems. Next, we prove that a version of Optimistic Mirror… (More)