• Corpus ID: 231924424

Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization

  title={Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization},
  author={Aadirupa Saha and Nagarajan Natarajan and Praneeth Netrapalli and Prateek Jain},
We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions ft admit a "pseudo1d" structure, i.e. ft(w) = `t(gt(w)) where the output of gt is one-dimensional. At each round, the learner observes context xt, plays prediction gt(wt;xt) (e.g. gt(·) = 〈xt, ·〉) for some wt ∈ R and observes loss `t(gt(wt)) where `t is a convex Lipschitz-continuous function. The goal is to minimize the standard regret metric. This pseudo-1d bandit… 
2 Citations

Figures from this paper

Minimax Regret for Bandit Convex Optimisation of Ridge Functions
A short information-theoretic proof is provided that the minimax regret is at most O(d √ n log(ndiam(K))) where n is the number of interactions, d the dimension and diam(K) is the diameter of the constraint set.
A Survey of Decentralized Online Learning
A thorough overview of DOL from the perspective of problem settings, communication, computation, and performances is provided and some potential future directions are also discussed in details.


Logarithmic regret algorithms for online convex optimization
Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Kernel-based methods for bandit convex optimization
We consider the adversarial convex bandit problem and we build the first poly(T)-time algorithm with poly(n) √T-regret for this problem. To do so we introduce three new ideas in the derivative-free
Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback
The first algorithm whose expected regret is O(T ), ignoring constant and logarithmic factors is given, building upon existing work on selfconcordant regularizers and one-point gradient estimation.
Optimistic Bandit Convex Optimization
This is the first algorithm admitting both a polynomial time complexity and a regret that is polynometric in the dimension of the action space that improves upon the original regret bound for Lipschitz loss functions, achieving a regret of $\widetilde O(T^{11/16}d^{3/8})$.
On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization
  • O. Shamir
  • Computer Science, Mathematics
  • 2013
The attainable error/regret in the bandit and derivative-free settings, as a function of the dimension d and the available number of queries T is investigated, and a precise characterization of the attainable performance for strongly-convex and smooth functions is provided.
On the Complexity of Bandit Linear Optimization
It is shown that the price of bandit information in this setting can be as large as $d$, disproving the well-known conjecture that the regret for bandit linear optimization is at most $\sqrt{d}$ times the full-information regret.
Stochastic Convex Optimization with Bandit Feedback
This paper addresses the problem of minimizing a convex, Lipschitz function f over a conveX, compact set χ under a stochastic bandit feedback model and demonstrates a generalization of the ellipsoid algorithm that incurs O(poly (d) √T) regret.
Multi-scale exploration of convex functions and bandit convex optimization
This paper uses a new map from a convex function to a distribution on its domain, with the property that this distribution is a multi-scale exploration of the function, to solve a decade-old open problem in adversarial bandit convex optimization.
Projection-Free Bandit Convex Optimization
This paper shows that the first computationally efficient projection-free algorithm for bandit convex optimization (BCO) achieves a sublinear regret of $O(nT^{4/5})$ for any bounded convex functions with uniformly bounded gradients.
Towards Gradient Free and Projection Free Stochastic Optimization
A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank- Wolfe algorithm makes it gradient free, and it is shown that the proposed algorithm converges to the optimal objective function at a rate of $O(1/T^{1/3}\right)$, where $T$ denotes the iteration count.