Corpus ID: 211532610

Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization

@article{Negiar2020StochasticFF,
  title={Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization},
  author={Geoffrey Negiar and Gideon Dresdner and Alicia Y. Tsai and Laurent El Ghaoui and Francesco Locatello and Fabian Pedregosa},
  journal={ArXiv},
  year={2020},
  volume={abs/2002.11860}
}
We propose a novel Stochastic Frank-Wolfe (a.k.a. conditional gradient) algorithm for constrained smooth finite-sum minimization with a generalized linear prediction/structure. This class of problems includes empirical risk minimization with sparse, low-rank, or other structured constraints. The proposed method is simple to implement, does not require step-size tuning, and has a constant per-iteration cost that is independent of the dataset size. Furthermore, as a byproduct of the method we… Expand
Deep Neural Network Training with Frank-Wolfe
TLDR
The general feasibility of training Neural Networks whose parameters are constrained by a convex feasible region using Frank-Wolfe algorithms is shown and it is demonstrated that, by choosing an appropriate region, one can achieve performance exceeding that of unconstrained stochastic Gradient Descent and matching state-of-the-art results relying on L^2-regularization. Expand
Projection-Free Adaptive Gradients for Large-Scale Optimization
TLDR
This paper proposes to solve the occurring constrained optimization subproblems via a fixed and small number of iterations of the Frank-Wolfe algorithm (often times only $2$ iterations), in order to preserve the low per-iteration complexity. Expand
Sequential Quadratic Optimization for Nonlinear Equality Constrained Stochastic Optimization
TLDR
Under reasonable assumptions, convergence (resp.,~convergence in expectation) from remote starting points is proved for the proposed deterministic (resp,~stochastic) algorithm. Expand
No-Regret Dynamics in the Fenchel Game: A Unified Framework for Algorithmic Convex Optimization
TLDR
It is shown that many classical first-order methods for convex optimization—including average-iterate gradient descent, the Frank-Wolfe algorithm, the Heavy Ball algorithm, and Nesterov’s acceleration methods—can be interpreted as special cases of this framework as long as each player makes the correct choice of no-regret strategy. Expand
Parameter-free Locally Accelerated Conditional Gradients
TLDR
A novel, Parameter-Free Locally accelerated CG (PF-La CG) algorithm, for which rigorous convergence guarantees are provided, which demonstrates local acceleration and showcases the practical improvements of PF-LaCG over non-accelerated algorithms, both in terms of iteration count and wall-clock time. Expand

References

SHOWING 1-10 OF 31 REFERENCES
Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization
TLDR
This work presents a new generalized stochastic Frank–Wolfe method which closes the gap in the dependence on the optimality tolerance, and introduces the notion of a “substitute gradient” that is a not-necessarily-unbiased estimate of the gradient. Expand
Variance-Reduced and Projection-Free Stochastic Optimization
TLDR
Stochastic Frank-Wolfe variants are proposed which substantially improve previous results in terms of the number of stochastic gradient evaluations needed to achieve 1 - e accuracy, and are observed in experiments on real-world datasets for a multiclass classification application. Expand
Stochastic Frank-Wolfe for Composite Convex Minimization
TLDR
This work proposes the first conditional-gradient-type method for solving stochastic optimization problems under affine constraints, and guarantees O(k^{-1/3}) convergence rate in expectation on the objective residual and O(-5/12}) on the feasibility gap. Expand
Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization
TLDR
A new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, is presented, and the broad application areas of this approach are discussed. Expand
On the Global Linear Convergence of Frank-Wolfe Optimization Variants
TLDR
This paper highlights and clarify several variants of the Frank-Wolfe optimization algorithm that have been successfully applied in practice: away-steps FW, pairwise FW, fully-corrective FW and Wolfe's minimum norm point algorithm, and proves for the first time that they all enjoy global linear convergence, under a weaker condition than strong convexity of the objective. Expand
Stochastic Frank-Wolfe methods for nonconvex optimization
TLDR
For objective functions that decompose into a finite-sum, ideas from variance reduction for convex optimization are leveraged to obtain new variance reduced nonconvex Frank- Wolfe methods that have provably faster convergence than the classical Frank-Wolfe method. Expand
Stochastic Conditional Gradient Methods: From Convex Minimization to Submodular Maximization
TLDR
Stochastic conditional gradient methods are proposed as an alternative solution relying on Approximating gradients via a simple averaging technique requiring a single stochastic gradient evaluation per iteration, and replacing projection step in proximal methods by a linear program lowers the computational complexity of each iteration. Expand
Linear Convergence of Stochastic Frank Wolfe Variants
TLDR
It is shown that the Away-step Stochastic Frank-Wolfe Algorithm (ASFW) and Pairwise Stochastics (PSFW) converge linearly in expectation, and it is also shown that if an algorithm convergences linearlyIn expectation then it converges linearly almost surely. Expand
Minimizing finite sums with the stochastic average gradient
TLDR
Numerical experiments indicate that the new SAG method often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies. Expand
On Frank-Wolfe and Equilibrium Computation
We consider the Frank-Wolfe (FW) method for constrained convex optimization, and we show that this classical technique can be interpreted from a different perspective: FW emerges as the computationExpand
...
1
2
3
4
...