# Online convex optimization in the bandit setting: gradient descent without a gradient

@article{Flaxman2005OnlineCO, title={Online convex optimization in the bandit setting: gradient descent without a gradient}, author={Abraham D. Flaxman and Adam Tauman Kalai and H. B. McMahan}, journal={ArXiv}, year={2005}, volume={cs.LG/0408007} }

We study a general online convex optimization problem. We have a convex set <i>S</i> and an unknown sequence of cost functions <i>c</i><inf>1</inf>, <i>c</i><inf>2</inf>,..., and in each period, we choose a feasible point <i>x<inf>t</inf></i> in <i>S</i>, and learn the cost <i>c<inf>t</inf></i>(<i>x<inf>t</inf></i>). If the function <i>c<inf>t</inf></i> is also revealed after each period then, as Zinkevich shows in [25], gradient descent can be used on these functions to get regret bounds of <i…

## 607 Citations

Distributed Online Optimization With Long-Term Constraints

- Computer ScienceIEEE Transactions on Automatic Control
- 2022

The proposed regret scalings match those obtained by state-of-the-art algorithms and fundamental limits in the corresponding centralized online optimization problem (for both convex and strongly convex loss functions).

On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization

- Computer Science, MathematicsCOLT
- 2013

The attainable error/regret in the bandit and derivative-free settings, as a function of the dimension d and the available number of queries T is investigated, and a precise characterization of the attainable performance for strongly-convex and smooth functions is provided.

Optimization of Smooth Functions With Noisy Observations: Local Minimax Rates

- Computer ScienceIEEE Transactions on Information Theory
- 2019

It is shown that for functions with fast growth around their global minima, carefully designed optimization algorithms can identify a near global minimizer with many fewer queries than worst-case global minimax theory predicts.

Online Convex Optimization with Continuous Switching Constraint

- Computer ScienceNeurIPS
- 2021

The essential idea is to carefully design an adaptive adversary, who can adjust the loss function according to the cumulative switching cost of the player incurred so far based on the orthogonal technique, and develop a simple gradient-based algorithm which enjoys the minimax optimal regret bound.

Logarithmic regret algorithms for online convex optimization

- Computer ScienceMachine Learning
- 2007

Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.

Logarithmic Regret Algorithms for Online Convex Optimization

- Computer ScienceCOLT
- 2006

This paper proposes several algorithms achieving logarithmic regret, which besides being more general are also much more efficient to implement, and gives an efficient algorithm based on the Newton method for optimization, a new tool in the field.

Regret bounded by gradual variation for online convex optimization

- Computer ScienceMachine Learning
- 2013

This paper presents two novel algorithms that bound the regret of the Follow the Regularized Leader algorithm by the gradual variation of cost functions, and develops a deterministic algorithm for online bandit optimization in multipoint bandit setting.

Improved Regret Bounds for Projection-free Bandit Convex Optimization

- Computer ScienceAISTATS
- 2020

The challenge of designing online algorithms for the bandit convex optimization problem (BCO) is revisited and the first such algorithm that attains expected regret is presented, using only overall calls to the linear optimization oracle, in expectation, where T is the number of prediction rounds.

Online strongly convex optimization with unknown delays

- Computer Science, MathematicsMach. Learn.
- 2022

This is the first work that solves online strongly convex optimization under the general delayed setting by combining with the classical n + 1 -point and two-point gradient estimators, where n is the dimensionality.

Minimizing Regret in Bandit Online Optimization in Unconstrained and Constrained Action Spaces

- Computer Science, MathematicsArXiv
- 2018

This work presents a novel algorithm to minimize the regret in both unconstrained and constrained action spaces and hinges on a classical idea of one-point estimation of the gradients of the cost functions based on their observed values.

## References

SHOWING 1-10 OF 40 REFERENCES

Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary

- Computer Science, MathematicsCOLT
- 2004

This paper gives an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala, for the case of an adaptive adversary, and achieves a regret bound of \(\mathcal{O}(T^3}{4}}\sqrt{ln(T)})).

Online Convex Programming and Generalized Infinitesimal Gradient Ascent

- Computer ScienceICML
- 2003

An algorithm for convex programming is introduced, and it is shown that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized inf initesimalgradient ascent (GIGA) is universally consistent.

Nearly Tight Bounds for the Continuum-Armed Bandit Problem

- Computer Science, MathematicsNIPS
- 2004

This work considers the case when the set of strategies is a subset of ℝd, and the cost functions are continuous, and improves on the best-known upper and lower bounds, closing the gap to a sublogarithmic factor.

Universal Portfolios

- Mathematics
- 1996

We exhibit an algorithm for portfolio selection that asymptotically outperforms the best stock in the market. Let x i = (x i1 ; x i2 ; : : : ; x im) t denote the performance of the stock market on…

Adaptive routing with end-to-end feedback: distributed learning and geometric approaches

- Computer ScienceSTOC '04
- 2004

A second algorithm for online shortest paths is presented, which solves the shortest-path problem using a chain of online decision oracles, one at each node of the graph, which has several advantages over the online linear optimization approach.

Gambling in a rigged casino: The adversarial multi-armed bandit problem

- Computer ScienceProceedings of IEEE 36th Annual Foundations of Computer Science
- 1995

A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given.

Efficient algorithms for universal portfolios

- Computer ScienceProceedings 41st Annual Symposium on Foundations of Computer Science
- 2000

This work presents an efficient implementation of the Universal algorithm that is based on non-uniform random walks that are rapidly mixing that works for non-financial applications of theUniversal algorithm, such as data compression and language modeling.

Exponentiated Gradient Versus Gradient Descent for Linear Predictors

- Computer ScienceInf. Comput.
- 1997

The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only a few components of the input are relevant for the predictions, which is quite tight already on simple artificial data.

Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C.

- Computer Science
- 2007

This comprehensive book offers 504 main pages divided into 17 chapters, covering multivariate analysis, basic tests in statistics, probability theory and convergence, random number generators and Markov processes, and over 250 exercises.

Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control

- Computer ScienceTechnometrics
- 2004

This book is suitable for a short course due to its expository nature and the material covered is of current interest, the informal tone is pleasing to the reader, and the author provides several insightful comments.