• Corpus ID: 3264230

Online convex optimization in the bandit setting: gradient descent without a gradient

@article{Flaxman2005OnlineCO,
  title={Online convex optimization in the bandit setting: gradient descent without a gradient},
  author={Abraham D. Flaxman and Adam Tauman Kalai and H. B. McMahan},
  journal={ArXiv},
  year={2005},
  volume={cs.LG/0408007}
}
We study a general online convex optimization problem. We have a convex set <i>S</i> and an unknown sequence of cost functions <i>c</i><inf>1</inf>, <i>c</i><inf>2</inf>,..., and in each period, we choose a feasible point <i>x<inf>t</inf></i> in <i>S</i>, and learn the cost <i>c<inf>t</inf></i>(<i>x<inf>t</inf></i>). If the function <i>c<inf>t</inf></i> is also revealed after each period then, as Zinkevich shows in [25], gradient descent can be used on these functions to get regret bounds of <i… 
On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization
  • O. Shamir
  • Computer Science, Mathematics
    COLT
  • 2013
TLDR
The attainable error/regret in the bandit and derivative-free settings, as a function of the dimension d and the available number of queries T is investigated, and a precise characterization of the attainable performance for strongly-convex and smooth functions is provided.
Optimization of Smooth Functions With Noisy Observations: Local Minimax Rates
TLDR
It is shown that for functions with fast growth around their global minima, carefully designed optimization algorithms can identify a near global minimizer with many fewer queries than worst-case global minimax theory predicts.
Online Convex Optimization with Continuous Switching Constraint
TLDR
The essential idea is to carefully design an adaptive adversary, who can adjust the loss function according to the cumulative switching cost of the player incurred so far based on the orthogonal technique, and develop a simple gradient-based algorithm which enjoys the minimax optimal regret bound.
Logarithmic regret algorithms for online convex optimization
TLDR
Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Logarithmic Regret Algorithms for Online Convex Optimization
TLDR
This paper proposes several algorithms achieving logarithmic regret, which besides being more general are also much more efficient to implement, and gives an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Regret bounded by gradual variation for online convex optimization
TLDR
This paper presents two novel algorithms that bound the regret of the Follow the Regularized Leader algorithm by the gradual variation of cost functions, and develops a deterministic algorithm for online bandit optimization in multipoint bandit setting.
Improved Regret Bounds for Projection-free Bandit Convex Optimization
TLDR
The challenge of designing online algorithms for the bandit convex optimization problem (BCO) is revisited and the first such algorithm that attains expected regret is presented, using only overall calls to the linear optimization oracle, in expectation, where T is the number of prediction rounds.
Online strongly convex optimization with unknown delays
TLDR
This is the first work that solves online strongly convex optimization under the general delayed setting by combining with the classical n + 1 -point and two-point gradient estimators, where n is the dimensionality.
Minimizing Regret in Bandit Online Optimization in Unconstrained and Constrained Action Spaces
TLDR
This work presents a novel algorithm to minimize the regret in both unconstrained and constrained action spaces and hinges on a classical idea of one-point estimation of the gradients of the cost functions based on their observed values.
Minimizing Regret of Bandit Online Optimization in Unconstrained Action Spaces
TLDR
This work presents a novel algorithm to minimize regret in unconstrained action spaces based on a classical idea of one-point estimation of the gradients of the cost functions based on their observed values, which is independent of problem parameters.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 40 REFERENCES
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary
TLDR
This paper gives an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala, for the case of an adaptive adversary, and achieves a regret bound of \(\mathcal{O}(T^3}{4}}\sqrt{ln(T)})).
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
TLDR
An algorithm for convex programming is introduced, and it is shown that it is really a generalization of infinitesimal gradient ascent, and the results here imply that generalized inf initesimalgradient ascent (GIGA) is universally consistent.
Nearly Tight Bounds for the Continuum-Armed Bandit Problem
TLDR
This work considers the case when the set of strategies is a subset of ℝd, and the cost functions are continuous, and improves on the best-known upper and lower bounds, closing the gap to a sublogarithmic factor.
Universal Portfolios
We exhibit an algorithm for portfolio selection that asymptotically outperforms the best stock in the market. Let x i = (x i1 ; x i2 ; : : : ; x im) t denote the performance of the stock market on
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
TLDR
A second algorithm for online shortest paths is presented, which solves the shortest-path problem using a chain of online decision oracles, one at each node of the graph, which has several advantages over the online linear optimization approach.
Gambling in a rigged casino: The adversarial multi-armed bandit problem
TLDR
A solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs is given.
Efficient algorithms for universal portfolios
  • A. Kalai, S. Vempala
  • Computer Science, Mathematics
    Proceedings 41st Annual Symposium on Foundations of Computer Science
  • 2000
TLDR
This work presents an efficient implementation of the Universal algorithm that is based on non-uniform random walks that are rapidly mixing that works for non-financial applications of theUniversal algorithm, such as data compression and language modeling.
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
TLDR
The bounds suggest that the losses of the algorithms are in general incomparable, but EG(+/-) has a much smaller loss if only a few components of the input are relevant for the predictions, which is quite tight already on simple artificial data.
Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C.
TLDR
This comprehensive book offers 504 main pages divided into 17 chapters, covering multivariate analysis, basic tests in statistics, probability theory and convergence, random number generators and Markov processes, and over 250 exercises.
Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control
TLDR
This book is suitable for a short course due to its expository nature and the material covered is of current interest, the informal tone is pleasing to the reader, and the author provides several insightful comments.
...
1
2
3
4
...