Conditional Gradient Sliding for Convex Optimization

@article{Lan2016ConditionalGS,
  title={Conditional Gradient Sliding for Convex Optimization},
  author={Guanghui Lan and Yi Zhou},
  journal={SIAM J. Optim.},
  year={2016},
  volume={26},
  pages={1379-1409}
}
In this paper, we present a new conditional gradient type method for convex optimization by calling a linear optimization ($LO$) oracle to minimize a series of linear functions over the feasible set. Different from the classic conditional gradient method, the conditional gradient sliding (CGS) algorithm developed herein can skip the computation of gradients from time to time and, as a result, can achieve the optimal complexity bounds in terms of not only the number of calls to the $LO$ oracle… 

Figures and Tables from this paper

Second-order Conditional Gradient Sliding.

The SOCGS algorithm is presented, which uses a projection-free algorithm to solve the constrained quadratic subproblems inexactly and is useful when the feasible region can only be accessed efficiently through a linear optimization oracle, and computing first-order information of the function, although possible, is costly.

Towards Gradient Free and Projection Free Stochastic Optimization

A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank- Wolfe algorithm makes it gradient free, and it is shown that the proposed algorithm converges to the optimal objective function at a rate of $O(1/T^{1/3}\right)$, where $T$ denotes the iteration count.

Projection Efficient Subgradient Method and Optimal Nonsmooth Frank-Wolfe Method

MOPES method is introduced, which carefully combines Moreau-Yosida smoothing and accelerated first-order schemes, and MOLES method, which is guaranteed to find a feasible $\epsilon$-suboptimality in high-dimensions solution using only $O(\ep silon^{-1})$ PO calls and optimal $O(epsilON^{-2})$ FO calls.

On the Online Frank-Wolfe Algorithms for Convex and Non-convex Optimizations

The online variants of the classical Frank-Wolfe algorithm only require simple iterative updates and a non-adaptive step size rule, in contrast to the hybrid schemes commonly considered in the literature, and are shown to converge even when the loss is non-convex.

Conditional Gradient Methods for Convex Optimization with Function Constraints

This paper presents a new constraint extrapolated condition gradient method that can achieve an ${\cal O}(1/\epsilon^2)$ iteration complexity for both smooth and structured nonsmooth function constrained convex optimization.

Restarting Frank-Wolfe

A new variant of Conditional Gradients is presented, that can dynamically adapt to the function's geometric properties using restarts and thus smoothly interpolates between the sublinear and linear regimes and applies to generic compact convex constraint sets.

Universal Conditional Gradient Sliding for Convex Optimization

This is the first time a sliding-type algorithm is able to improve not only the gradient complexity but also the overall complexity for computing an approximate solution.

Conditional gradient type methods for composite nonlinear and stochastic optimization

This paper presents a unified analysis for the CGT method in the sense that it achieves the best known rate of convergence when the weakly smooth term is nonconvex and possesses (nearly) optimal complexity if it turns out to be convex.

Conditional gradient type methods for composite nonlinear and stochastic optimization

  • Saeed Ghadimi
  • Computer Science, Mathematics
    Mathematical Programming
  • 2018
This paper presents a unified analysis for the CGT method in the sense that it achieves the best known rate of convergence when the weakly smooth term is nonconvex and possesses (nearly) optimal complexity if it turns out to be convex.
...

References

SHOWING 1-10 OF 42 REFERENCES

Gradient sliding for composite optimization

If the smooth component in the composite function is strongly convex, the developed gradient sliding algorithms can significantly reduce the number of graduate and subgradient evaluations for the smooth and nonsmooth component to O(1/ϵ), respectively.

The Complexity of Large-scale Convex Programming under a Linear Optimization Oracle

This paper formally establishes the theoretical optimality or nearly optimality, in the large-scale case, for the CG method and its variants to solve different classes of CP problems, including smooth, nonsmooth and certain saddle-point problems.

Iterated Hard Shrinkage for Minimization Problems with Sparsity Constraints

It is shown that the hard shrinkage algorithm is a special case of the generalized conditional gradient method with quadratic discrepancy term and strong convergence properties of the iterates with convergence rates $\mathcal{O}(n^{-1/2})$ and $\lambda^n)$ for $p=1$ and $1 < p \leq 2$, respectively.

Sparse Convex Optimization Methods for Machine Learning

A convergence proof guaranteeing e-small error is given after O( 1e ) iterations, and the sparsity of approximate solutions for any `1-regularized convex optimization problem (and for optimization over the simplex), expressed as a function of the approximation quality.

A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization

A novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate, which gives an exponential improvement in convergence rate over previous results.

An optimal method for stochastic composite optimization

The accelerated stochastic approximation (AC-SA) algorithm based on Nesterov’s optimal method for smooth CP is introduced, and it is shown that the AC-SA algorithm can achieve the aforementioned lower bound on the rate of convergence for SCO.

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization

A new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, is presented, and the broad application areas of this approach are discussed.

Convergence Rates for Conditional Gradient Sequences Generated by Implicit Step Length Rules

Conditional gradient algorithms with implicit line minimization and Goldstein–Armijo step length rules are considered for the problem $\min _\Omega F$ with $\Omega $ a bounded convex subset of a real

Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization

A randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samples are taken at each iteration depending on the total budget of Stochastic samples allowed, is proposed, which shows nearly optimal complexity of the algorithm for convex stoChastic programming.

Dual subgradient algorithms for large-scale nonsmooth learning problems

This work proposes a novel approach to solving nonsmooth optimization problems arising in learning applications where Fenchel-type representation of the objective function is available and requires the problem domain to admit a Linear Optimization oracle—the ability to efficiently maximize a linear form on the domain of the primal problem.