• Corpus ID: 219687747

Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization

@article{Mortagy2020WalkingIT,
title={Walking in the Shadow: A New Perspective on Descent Directions for Constrained Minimization},
author={Hassan Mortagy and Swati Gupta and Sebastian Pokutta},
journal={ArXiv},
year={2020},
volume={abs/2006.08426}
}
• Published 15 June 2020
• Computer Science
• ArXiv
Descent directions such as movement towards Frank-Wolfe vertices, away steps, in-face away steps and pairwise directions have been an important design consideration in conditional gradient descent (CGD) variants. In this work, we attempt to demystify the impact of movement in these directions towards attaining constrained minimizers. The best local direction of descent is the directional derivative of the projection of the gradient, which we refer to as the $\textit{shadow}$ of the gradient. We…

Figures from this paper

Practical Frank-Wolfe algorithms
This work generalizes in-face Frank-Wolfe directions to polytopes in which faces cannot be efficiently computed, and describes a generic recursive procedure that can be used in conjunction with several FW-style techniques.
Affine Invariant Analysis of Frank-Wolfe on Strongly Convex Sets
• Computer Science, Mathematics
ICML
• 2021
Interestingly, it is shown that typical backtracking line-searches using smoothness of the objective function surprisingly converge to an affine invariant step size, despite using affine-dependent norms in the step size's computation.
Sparser Kernel Herding with Pairwise Conditional Gradients without Swap Steps
• Computer Science
• 2021
This work proposes a new variant of PCG, the so-called Blended Pairwise Conditional Gradients (BPCG), which does not exhibit any swap steps, is very easy to implement, and does not require any internal gradient alignment procedures.
Avoiding bad steps in Frank Wolfe variants
• Computer Science
• 2020
The Short Step Chain (SSC) procedure, which skips gradient computations in consecutive short steps until proper stopping conditions are satisfied, is defined, which allows a unified analysis and converge rates in the general smooth non convex setting, as well as a linear convergence rate under a Kurdyka-Lojasiewicz (KL).

References

SHOWING 1-10 OF 39 REFERENCES
Linearly convergent away-step conditional gradient for non-strongly convex functions
• Mathematics, Computer Science
Math. Program.
• 2017
A variant of the algorithm and an analysis based on simple linear programming duality arguments, as well as corresponding error bounds are provided, which enables the incorporation of the additional linear term and depends on a new constant, that is explicitly expressed in terms of the problem’s parameters and the geometry of the feasible set.
Blended Conditional Gradients: the unconditioning of conditional gradients
• Computer Science
ICML 2019
• 2018
This work presents a blended conditional gradient approach for minimizing a smooth convex function over a polytope P, combining the Frank--Wolfe algorithm with gradient-based steps, achieving linear convergence for strongly convex functions, along with good practical performance.
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
• Computer Science, Mathematics
ECML/PKDD
• 2016
This work shows that this much-older Polyak-Lojasiewicz (PL) inequality is actually weaker than the main conditions that have been explored to show linear convergence rates without strong convexity over the last 25 years, leading to simple proofs of linear convergence of these methods.
An Extended Frank-Wolfe Method with "In-Face" Directions, and Its Application to Low-Rank Matrix Completion
• Computer Science
SIAM J. Optim.
• 2017
This work presents an extension of the Frank-Wolfe method that is designed to induce near-optimal solutions on low-dimensional faces of the feasible region by a new approach to generating in-face" directions at each iteration, as well as through new choice rules for selecting between in-face and regular" Frank- Wolfe steps.
On the Global Linear Convergence of Frank-Wolfe Optimization Variants
• Computer Science
NIPS
• 2015
This paper highlights and clarify several variants of the Frank-Wolfe optimization algorithm that have been successfully applied in practice: away-steps FW, pairwise FW, fully-corrective FW and Wolfe's minimum norm point algorithm, and proves for the first time that they all enjoy global linear convergence, under a weaker condition than strong convexity of the objective.
Boosting Frank-Wolfe by Chasing Gradients
• Computer Science
ICML
• 2020
This paper proposes to speed up the Frank-Wolfe algorithm by better aligning the descent direction with that of the negative gradient via a subroutine, and derives convergence rates to $\mathcal{O}(1/t)$ to $e^{-\omega t})$ of the method.
Polytope Conditioning and Linear Convergence of the Frank-Wolfe Algorithm
• Mathematics
Math. Oper. Res.
• 2019
For a convex quadratic objective, it is shown that the rate of convergence is determined by a condition number of a suitably scaled polytope, and new insight is given into the linear convergence property.
Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes
• Computer Science
NIPS
• 2016
A new conditional gradient variant and a corresponding analysis that improves on both of the above shortcomings, and a novel way to compute decomposition-invariant away-steps that applies to several important structured polytopes that capture central concepts.
Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization
A new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, is presented, and the broad application areas of this approach are discussed.
Conditional Gradient Sliding for Convex Optimization
• Computer Science
SIAM J. Optim.
• 2016
The conditional gradient sliding (CGS) algorithm developed herein can skip the computation of gradients from time to time and, as a result, can achieve the optimal complexity bounds in terms of not only the number of calls to the $LO$ oracle but also thenumber of gradient evaluations.