# Theory of Convex Optimization for Machine Learning

@article{Bubeck2014TheoryOC, title={Theory of Convex Optimization for Machine Learning}, author={S{\'e}bastien Bubeck}, journal={ArXiv}, year={2014}, volume={abs/1405.4980} }

This monograph presents the main mathematical ideas in convex optimization. Starting from the fundamental theory of black-box optimization, the material progresses towards recent advances in structural optimization and stochastic optimization. Our presentation of black-box optimization, strongly influenced by the seminal book of Nesterov, includes the analysis of the Ellipsoid Method, as well as (accelerated) gradient descent schemes. We also pay special attention to non-Euclidean settings… Expand

#### Supplemental Code

#### Figures, Tables, and Topics from this paper

#### Paper Mentions

#### 88 Citations

On Global Linear Convergence in Stochastic Nonconvex Optimization for Semidefinite Programming

- Computer Science
- IEEE Transactions on Signal Processing
- 2019

An answer is provided that the stochastic gradient descent method can be adapted to solve the nonconvex reformulation of the original convex problem, with a global linear convergence when using a fixed step size, i.e., converging exponentially fast to the population minimizer within an optimal statistical precision in the restricted strongly convex case. Expand

Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes

- Computer Science, Mathematics
- ArXiv
- 2021

This work considers iterative projections of close-by points over widely-prevalent submodular base polytopes B(f), and develops a toolkit to speed up the computation of projections using both discrete and continuous perspectives. Expand

Stochastic Gradient Descent For Modern Machine Learning: Theory, Algorithms And Applications

- Computer Science
- 2019

This thesis considers the behavior of the final iterate of SGD with varying stepsize schemes, including the standard polynomially decaying stepsizes and the practically preferred step decay scheme, with an aim to achieve minimax rates. Expand

Provable non-convex projected gradient descent for a class of constrained matrix optimization problems

- Computer Science, Mathematics
- ArXiv
- 2016

The Projected Factored Gradient Descent (ProjFGD) algorithm is proposed, that operates on the low-rank factorization of the variable space, and it is shown that the method favors local linear convergence rate in the non-convex factored space, for a class of convex norm-constrained problems. Expand

An accelerated algorithm for delayed distributed convex optimization

- Computer Science
- 2016

This thesis provides a framework for distributed delayed convex optimization methods for networks in a master-server setting and proves that a delayed accelerated method maintains the optimality of the algorithm with a convergence rate of O(1/t²). Expand

Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method

- Mathematics, Computer Science
- ITCS
- 2018

A novel accelerated first-order method that achieves the asymptotically optimal convergence rate for smooth functions in the first- order oracle model and is motivated by the discretization of an accelerated continuous-time dynamics using the classical method of implicit Euler discretized. Expand

Alternating Randomized Block Coordinate Descent

- Computer Science, Mathematics
- ICML
- 2018

This work introduces a novel algorithm AR-BCD, whose convergence time scales independently of the least smooth (possibly non-smooth) block, and obtains the first nontrivial accelerated alternating minimization algorithm. Expand

Solving Combinatorial Games using Products, Projections and Lexicographically Optimal Bases

- Computer Science, Mathematics
- ArXiv
- 2016

A novel primal-style algorithm for computing Bregman projections on the base polytopes of polymatroids and a general recipe to simulate the multiplicative weights update algorithm in time polynomial in their natural dimension are given. Expand

Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances

- Mathematics, Computer Science
- ICML
- 2019

This work shows accelerated linear rates in the $p-Wasserstein metric for any $p\geq 1$ with improved sensitivity to noise for both AG and HB through a non-asymptotic analysis under some additional assumptions on the noise structure. Expand

Multi-stage stochastic gradient method with momentum acceleration

- Computer Science
- Signal Process.
- 2021

A multi-stage stochastic gradient descent with momentum acceleration method, named MAGNET, for first-order stochastically convex optimization, which obtains an accelerated rate of convergence, and is adaptive and free from hyper-parameter tuning. Expand

#### References

SHOWING 1-10 OF 58 REFERENCES

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization

- Mathematics, Computer Science
- ICML
- 2013

A new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, is presented, and the broad application areas of this approach are discussed. Expand

Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n)

- Computer Science, Mathematics
- NIPS
- 2013

We consider the stochastic approximation problem where a convex function has to be minimized, given only the knowledge of unbiased estimates of its gradients at certain points, a framework which… Expand

Introductory Lectures on Convex Optimization - A Basic Course

- Computer Science
- Applied Optimization
- 2004

It was in the middle of the 1980s, when the seminal paper by Kar markar opened a new epoch in nonlinear optimization, and it became more and more common that the new methods were provided with a complexity analysis, which was considered a better justification of their efficiency than computational experiments. Expand

Parallel coordinate descent methods for big data optimization

- Mathematics, Computer Science
- Math. Program.
- 2016

In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex… Expand

Lectures on modern convex optimization - analysis, algorithms, and engineering applications

- Computer Science, Mathematics
- MPS-SIAM series on optimization
- 2001

The authors present the basic theory of state-of-the-art polynomial time interior point methods for linear, conic quadratic, and semidefinite programming as well as their numerous applications in engineering. Expand

Mirror descent and nonlinear projected subgradient methods for convex optimization

- Mathematics, Computer Science
- Oper. Res. Lett.
- 2003

It is shown that the MDA can be viewed as a nonlinear projected-subgradient type method, derived from using a general distance-like function instead of the usual Euclidean squared distance, and derived in a simple way convergence and efficiency estimates. Expand

Interior-point polynomial algorithms in convex programming

- Mathematics, Computer Science
- Siam studies in applied mathematics
- 1994

This book describes the first unified theory of polynomial-time interior-point methods, and describes several of the new algorithms described, e.g., the projective method, which have been implemented, tested on "real world" problems, and found to be extremely efficient in practice. Expand

A mathematical view of interior-point methods in convex optimization

- Mathematics, Computer Science
- MPS-SIAM series on optimization
- 2001

This compact book will take a reader who knows little of interior-point methods to within sight of the research frontier, developing key ideas that were over a decade in the making by numerous interior- point method researchers. Expand

Sublinear Optimization for Machine Learning

- Mathematics, Computer Science
- 2010 IEEE 51st Annual Symposium on Foundations of Computer Science
- 2010

Lower bounds are given which show the running times of many of the algorithms to be nearly best possible in the unit-cost RAM model and implementations of these algorithms in the semi-streaming setting, obtaining the first low pass polylogarithmic space and sub linear time algorithms achieving arbitrary approximation factor. Expand

Efficient projections onto the l1-ball for learning in high dimensions

- Mathematics, Computer Science
- ICML '08
- 2008

Efficient algorithms for projecting a vector onto the l1-ball are described and variants of stochastic gradient projection methods augmented with these efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques. Expand