# Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization

@article{Saha2021OptimalRA, title={Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization}, author={Aadirupa Saha and Nagarajan Natarajan and Praneeth Netrapalli and Prateek Jain}, journal={ArXiv}, year={2021}, volume={abs/2102.07387} }

We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions ft admit a "pseudo1d" structure, i.e. ft(w) = `t(gt(w)) where the output of gt is one-dimensional. At each round, the learner observes context xt, plays prediction gt(wt;xt) (e.g. gt(·) = 〈xt, ·〉) for some wt ∈ R and observes loss `t(gt(wt)) where `t is a convex Lipschitz-continuous function. The goal is to minimize the standard regret metric. This pseudo-1d bandit…

## Figures from this paper

## 2 Citations

Minimax Regret for Bandit Convex Optimisation of Ridge Functions

- Computer Science, MathematicsArXiv
- 2021

A short information-theoretic proof is provided that the minimax regret is at most O(d √ n log(ndiam(K))) where n is the number of interactions, d the dimension and diam(K) is the diameter of the constraint set.

A Survey of Decentralized Online Learning

- Computer ScienceArXiv
- 2022

A thorough overview of DOL from the perspective of problem settings, communication, computation, and performances is provided and some potential future directions are also discussed in details.

## References

SHOWING 1-10 OF 25 REFERENCES

Logarithmic regret algorithms for online convex optimization

- Computer ScienceMachine Learning
- 2007

Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.

Kernel-based methods for bandit convex optimization

- Computer ScienceSTOC
- 2017

We consider the adversarial convex bandit problem and we build the first poly(T)-time algorithm with poly(n) √T-regret for this problem. To do so we introduce three new ideas in the derivative-free…

Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback

- Computer Science, MathematicsAISTATS
- 2011

The first algorithm whose expected regret is O(T ), ignoring constant and logarithmic factors is given, building upon existing work on selfconcordant regularizers and one-point gradient estimation.

Optimistic Bandit Convex Optimization

- Computer ScienceNIPS
- 2016

This is the first algorithm admitting both a polynomial time complexity and a regret that is polynometric in the dimension of the action space that improves upon the original regret bound for Lipschitz loss functions, achieving a regret of $\widetilde O(T^{11/16}d^{3/8})$.

On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization

- Computer Science, MathematicsCOLT
- 2013

The attainable error/regret in the bandit and derivative-free settings, as a function of the dimension d and the available number of queries T is investigated, and a precise characterization of the attainable performance for strongly-convex and smooth functions is provided.

On the Complexity of Bandit Linear Optimization

- Computer ScienceCOLT
- 2015

It is shown that the price of bandit information in this setting can be as large as $d$, disproving the well-known conjecture that the regret for bandit linear optimization is at most $\sqrt{d}$ times the full-information regret.

Stochastic Convex Optimization with Bandit Feedback

- Computer Science, MathematicsSIAM J. Optim.
- 2011

This paper addresses the problem of minimizing a convex, Lipschitz function f over a conveX, compact set χ under a stochastic bandit feedback model and demonstrates a generalization of the ellipsoid algorithm that incurs O(poly (d) √T) regret.

Multi-scale exploration of convex functions and bandit convex optimization

- Computer ScienceCOLT
- 2016

This paper uses a new map from a convex function to a distribution on its domain, with the property that this distribution is a multi-scale exploration of the function, to solve a decade-old open problem in adversarial bandit convex optimization.

Projection-Free Bandit Convex Optimization

- Computer ScienceAISTATS
- 2019

This paper shows that the first computationally efficient projection-free algorithm for bandit convex optimization (BCO) achieves a sublinear regret of $O(nT^{4/5})$ for any bounded convex functions with uniformly bounded gradients.

Towards Gradient Free and Projection Free Stochastic Optimization

- Computer ScienceAISTATS
- 2019

A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank- Wolfe algorithm makes it gradient free, and it is shown that the proposed algorithm converges to the optimal objective function at a rate of $O(1/T^{1/3}\right)$, where $T$ denotes the iteration count.