• Corpus ID: 3264230

Online convex optimization in the bandit setting: gradient descent without a gradient

@article{Flaxman2005OnlineCO,
title={Online convex optimization in the bandit setting: gradient descent without a gradient},
author={Abraham D. Flaxman and Adam Tauman Kalai and H. B. McMahan},
journal={ArXiv},
year={2005},
volume={cs.LG/0408007}
}
• Published 2 August 2004
• Computer Science
• ArXiv
We study a general online convex optimization problem. We have a convex set <i>S</i> and an unknown sequence of cost functions <i>c</i><inf>1</inf>, <i>c</i><inf>2</inf>,..., and in each period, we choose a feasible point <i>x<inf>t</inf></i> in <i>S</i>, and learn the cost <i>c<inf>t</inf></i>(<i>x<inf>t</inf></i>). If the function <i>c<inf>t</inf></i> is also revealed after each period then, as Zinkevich shows in [25], gradient descent can be used on these functions to get regret bounds of <i…
607 Citations
Distributed Online Optimization With Long-Term Constraints
• Computer Science
IEEE Transactions on Automatic Control
• 2022
The proposed regret scalings match those obtained by state-of-the-art algorithms and fundamental limits in the corresponding centralized online optimization problem (for both convex and strongly convex loss functions).
On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization
• O. Shamir
• Computer Science, Mathematics
COLT
• 2013
The attainable error/regret in the bandit and derivative-free settings, as a function of the dimension d and the available number of queries T is investigated, and a precise characterization of the attainable performance for strongly-convex and smooth functions is provided.
Optimization of Smooth Functions With Noisy Observations: Local Minimax Rates
• Computer Science
IEEE Transactions on Information Theory
• 2019
It is shown that for functions with fast growth around their global minima, carefully designed optimization algorithms can identify a near global minimizer with many fewer queries than worst-case global minimax theory predicts.
Online Convex Optimization with Continuous Switching Constraint
• Computer Science
NeurIPS
• 2021
The essential idea is to carefully design an adaptive adversary, who can adjust the loss function according to the cumulative switching cost of the player incurred so far based on the orthogonal technique, and develop a simple gradient-based algorithm which enjoys the minimax optimal regret bound.
Logarithmic regret algorithms for online convex optimization
• Computer Science
Machine Learning
• 2007
Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Logarithmic Regret Algorithms for Online Convex Optimization
• Computer Science
COLT
• 2006
This paper proposes several algorithms achieving logarithmic regret, which besides being more general are also much more efficient to implement, and gives an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Regret bounded by gradual variation for online convex optimization
• Computer Science
Machine Learning
• 2013
This paper presents two novel algorithms that bound the regret of the Follow the Regularized Leader algorithm by the gradual variation of cost functions, and develops a deterministic algorithm for online bandit optimization in multipoint bandit setting.
Improved Regret Bounds for Projection-free Bandit Convex Optimization
• Computer Science
AISTATS
• 2020
The challenge of designing online algorithms for the bandit convex optimization problem (BCO) is revisited and the first such algorithm that attains expected regret is presented, using only overall calls to the linear optimization oracle, in expectation, where T is the number of prediction rounds.
Online strongly convex optimization with unknown delays
• Computer Science, Mathematics
Mach. Learn.
• 2022
This is the first work that solves online strongly convex optimization under the general delayed setting by combining with the classical n + 1 -point and two-point gradient estimators, where n is the dimensionality.
Minimizing Regret in Bandit Online Optimization in Unconstrained and Constrained Action Spaces
• Computer Science, Mathematics
ArXiv
• 2018
This work presents a novel algorithm to minimize the regret in both unconstrained and constrained action spaces and hinges on a classical idea of one-point estimation of the gradients of the cost functions based on their observed values.

References

SHOWING 1-10 OF 40 REFERENCES
• Computer Science, Mathematics
COLT
• 2004
This paper gives an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala, for the case of an adaptive adversary, and achieves a regret bound of \(\mathcal{O}(T^3}{4}}\sqrt{ln(T)})).
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
An algorithm for convex programming is introduced, and it is shown that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized inf initesimalgradient ascent (GIGA) is universally consistent.
Nearly Tight Bounds for the Continuum-Armed Bandit Problem
This work considers the case when the set of strategies is a subset of ℝd, and the cost functions are continuous, and improves on the best-known upper and lower bounds, closing the gap to a sublogarithmic factor.
Universal Portfolios
We exhibit an algorithm for portfolio selection that asymptotically outperforms the best stock in the market. Let x i = (x i1 ; x i2 ; : : : ; x im) t denote the performance of the stock market on
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
• Computer Science
STOC '04
• 2004
A second algorithm for online shortest paths is presented, which solves the shortest-path problem using a chain of online decision oracles, one at each node of the graph, which has several advantages over the online linear optimization approach.
Gambling in a rigged casino: The adversarial multi-armed bandit problem
• Computer Science
Proceedings of IEEE 36th Annual Foundations of Computer Science
• 1995
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given.
Efficient algorithms for universal portfolios
• Computer Science
Proceedings 41st Annual Symposium on Foundations of Computer Science
• 2000
This work presents an efficient implementation of the Universal algorithm that is based on non-uniform random walks that are rapidly mixing that works for non-financial applications of theUniversal algorithm, such as data compression and language modeling.
• Computer Science
Inf. Comput.
• 1997
The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only a few components of the input are relevant for the predictions, which is quite tight already on simple artificial data.
Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C.
• Computer Science
• 2007
This comprehensive book offers 504 main pages divided into 17 chapters, covering multivariate analysis, basic tests in statistics, probability theory and convergence, random number generators and Markov processes, and over 250 exercises.
Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control
This book is suitable for a short course due to its expository nature and the material covered is of current interest, the informal tone is pleasing to the reader, and the author provides several insightful comments.