# Conditional Gradient Sliding for Convex Optimization

@article{Lan2016ConditionalGS,
title={Conditional Gradient Sliding for Convex Optimization},
author={Guanghui Lan and Yi Zhou},
journal={SIAM J. Optim.},
year={2016},
volume={26},
pages={1379-1409}
}
• Published 29 June 2016
• Computer Science
• SIAM J. Optim.
In this paper, we present a new conditional gradient type method for convex optimization by calling a linear optimization ($LO$) oracle to minimize a series of linear functions over the feasible set. Different from the classic conditional gradient method, the conditional gradient sliding (CGS) algorithm developed herein can skip the computation of gradients from time to time and, as a result, can achieve the optimal complexity bounds in terms of not only the number of calls to the $LO$ oracle…
141 Citations

## Figures and Tables from this paper

• Computer Science
• 2020
The SOCGS algorithm is presented, which uses a projection-free algorithm to solve the constrained quadratic subproblems inexactly and is useful when the feasible region can only be accessed efficiently through a linear optimization oracle, and computing first-order information of the function, although possible, is costly.
• Computer Science
AISTATS
• 2019
A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank- Wolfe algorithm makes it gradient free, and it is shown that the proposed algorithm converges to the optimal objective function at a rate of $O(1/T^{1/3}\right)$, where $T$ denotes the iteration count.
• Computer Science, Mathematics
NeurIPS
• 2020
MOPES method is introduced, which carefully combines Moreau-Yosida smoothing and accelerated first-order schemes, and MOLES method, which is guaranteed to find a feasible $\epsilon$-suboptimality in high-dimensions solution using only $O(\ep silon^{-1})$ PO calls and optimal $O(epsilON^{-2})$ FO calls.
• Computer Science
• 2015
The online variants of the classical Frank-Wolfe algorithm only require simple iterative updates and a non-adaptive step size rule, in contrast to the hybrid schemes commonly considered in the literature, and are shown to converge even when the loss is non-convex.
• Computer Science
ArXiv
• 2020
This paper presents a new constraint extrapolated condition gradient method that can achieve an ${\cal O}(1/\epsilon^2)$ iteration complexity for both smooth and structured nonsmooth function constrained convex optimization.
• Computer Science, Mathematics
AISTATS
• 2019
A new variant of Conditional Gradients is presented, that can dynamically adapt to the function's geometric properties using restarts and thus smoothly interpolates between the sublinear and linear regimes and applies to generic compact convex constraint sets.
• Computer Science
• 2021
This is the first time a sliding-type algorithm is able to improve not only the gradient complexity but also the overall complexity for computing an approximate solution.
This paper presents a unified analysis for the CGT method in the sense that it achieves the best known rate of convergence when the weakly smooth term is nonconvex and possesses (nearly) optimal complexity if it turns out to be convex.
• Computer Science, Mathematics
Mathematical Programming
• 2018
This paper presents a unified analysis for the CGT method in the sense that it achieves the best known rate of convergence when the weakly smooth term is nonconvex and possesses (nearly) optimal complexity if it turns out to be convex.

## References

SHOWING 1-10 OF 42 REFERENCES

If the smooth component in the composite function is strongly convex, the developed gradient sliding algorithms can significantly reduce the number of graduate and subgradient evaluations for the smooth and nonsmooth component to O(1/ϵ), respectively.
This paper formally establishes the theoretical optimality or nearly optimality, in the large-scale case, for the CG method and its variants to solve different classes of CP problems, including smooth, nonsmooth and certain saddle-point problems.
• Mathematics
SIAM J. Sci. Comput.
• 2008
It is shown that the hard shrinkage algorithm is a special case of the generalized conditional gradient method with quadratic discrepancy term and strong convergence properties of the iterates with convergence rates $\mathcal{O}(n^{-1/2})$ and $\lambda^n)$ for $p=1$ and $1 < p \leq 2$, respectively.
A convergence proof guaranteeing e-small error is given after O( 1e ) iterations, and the sparsity of approximate solutions for any `1-regularized convex optimization problem (and for optimization over the simplex), expressed as a function of the approximation quality.
• Computer Science
• 2013
A novel conditional gradient algorithm for smooth and strongly convex optimization over polyhedral sets that performs only a single linear optimization step over the domain on each iteration and enjoys a linear convergence rate, which gives an exponential improvement in convergence rate over previous results.
The accelerated stochastic approximation (AC-SA) algorithm based on Nesterov’s optimal method for smooth CP is introduced, and it is shown that the AC-SA algorithm can achieve the aforementioned lower bound on the rate of convergence for SCO.
A new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, is presented, and the broad application areas of this approach are discussed.
Conditional gradient algorithms with implicit line minimization and Goldstein–Armijo step length rules are considered for the problem $\min _\Omega F$ with $\Omega$ a bounded convex subset of a real
• Computer Science
Math. Program.
• 2016
A randomized stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of samples are taken at each iteration depending on the total budget of Stochastic samples allowed, is proposed, which shows nearly optimal complexity of the algorithm for convex stoChastic programming.
• Computer Science
Math. Program.
• 2014
This work proposes a novel approach to solving nonsmooth optimization problems arising in learning applications where Fenchel-type representation of the objective function is available and requires the problem domain to admit a Linear Optimization oracle—the ability to efficiently maximize a linear form on the domain of the primal problem.