A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

@inproceedings{Bellet2015ADF,
  title={A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning},
  author={Aur{\'e}lien Bellet and Yingyu Liang and Alireza Bagheri Garakani and Maria-Florina Balcan and Fei Sha},
  booktitle={SDM},
  year={2015}
}
Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error $\epsilon$ and communication cost that do… 

Figures from this paper

Quantized Frank-Wolfe: Communication-Efficient Distributed Optimization
TLDR
Quantized Frank-Wolfe (QFW), the first projection-free and communication-efficient algorithm for solving constrained optimization problems at scale, is proposed and strong theoretical guarantees on the convergence rate of QFW are provided.
Quantized Frank-Wolfe: Faster Optimization, Lower Communication, and Projection Free
TLDR
Quantized-Frank-Wolfe (QFW), the first projection-free and communication-efficient algorithm for solving constrained optimization problems at scale, is proposed and strong theoretical guarantees on the convergence rate of QFW are provided.
Communication-Efficient Projection-Free Algorithm for Distributed Optimization
TLDR
This paper proposes a distributed projection free algorithm named Distributed Conditional Gradient Sliding (DCGS), based on the primal-dual algorithm, yielding a modular analysis that can be exploited to improve linear oracle complexity whenever centralized Frank-Wolfe can be improved.
D-FW: Communication efficient distributed algorithms for high-dimensional sparse optimization
TLDR
The novelty of this work is to develop communication efficient algorithms using the stochastic Frank-Wolfe (sFW) algorithm, where the gradient computation is inexact but controllable.
Gradient compression for communication-limited convex optimization
TLDR
This paper establishes and strengthens the convergence guarantees for gradient descent under a family of gradient compression techniques, and derives admissible step sizes and quantifies both the number of iterations and the numbers of bits that need to be exchanged to reach a target accuracy.
On the Intersection of Communication and Machine Learning
TLDR
This thesis introduced an reinforcement learning framework to solve the resource allocation problems in heterogeneous millimeter wave network and proposed the distributed coreset based boosting framework.
A distributed Frank–Wolfe framework for learning low-rank matrices with the trace norm
TLDR
A theoretical analysis of the convergence of DFW-Trace is provided, showing that it can ensure sublinear convergence in expectation to an optimal solution with few power iterations per epoch.
Scalable Projection-Free Optimization
TLDR
This dissertation proposes 1-SFW, the first projection-free method that requires only one sample per iteration to update the optimization variable and yet achieves the best known complexity bounds for convex, non-convex, and monotone DR-submodular settings.
Communication-Efficient Asynchronous Stochastic Frank-Wolfe over Nuclear-norm Balls
TLDR
This work proposes an asynchronous Stochastic Frank Wolfe (SFW-asyn) method, which, for the first time, solves the two problems simultaneously, while successfully maintaining the same convergence rate as the vanilla SFW.
Learning Privately over Distributed Features: An ADMM Sharing Approach
TLDR
This paper introduces a novel differentially private ADMM sharing algorithm and bound the privacy guarantee with carefully designed noise perturbation, and establishes convergence and iteration complexity results for the proposed parallel ADMM algorithm under non-convex loss.
...
...

References

SHOWING 1-10 OF 50 REFERENCES
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
TLDR
This work develops and analyze distributed algorithms based on dual subgradient averaging and provides sharp bounds on their convergence rates as a function of the network size and topology, and shows that the number of iterations required by the algorithm scales inversely in the spectral gap of thenetwork.
Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation
Many machine learning approaches are characterized by information constraints on how they interact with the training data. These include memory and sequential access constraints (e.g. fast
Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization
TLDR
A new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, is presented, and the broad application areas of this approach are discussed.
Distributed Learning, Communication Complexity and Privacy
TLDR
General upper and lower bounds on the amount of communication needed to learn well are provided, showing that in addition to VC- dimension and covering number, quantities such as the teaching-dimension and mistake-bound of a class play an important role.
Optimal Distributed Online Prediction
TLDR
The distributed mini-batch (DMB) framework is presented, a method of converting a serial gradient-based online algorithm into a distributed algorithm, and an asymptotically optimal regret bound is proved for smooth convex loss functions and stochastic examples.
Distributed k-means and k-median clustering on general communication topologies
TLDR
A distributed method for constructing a global coreset which improves over the previous methods by reducing the communication complexity, and which works over general communication topologies is provided.
Consensus-Based Distributed Support Vector Machines
This paper develops algorithms to train support vector machines when training data are distributed across different nodes, and their communication to a centralized processing unit is prohibited due
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers
TLDR
It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
TLDR
A distributed optimization algorithm is presented by employing a stochastic dual coordinate ascent method and an analysis of the tradeoff between computation and communication is conducted, and competitive performances are observed.
Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
TLDR
This paper develops a simple, two-stage protocol GREEDI, that is easily implemented using MapReduce style computations and demonstrates the effectiveness of the approach on several applications, including sparse Gaussian process inference and exemplar-based clustering, on tens of millions of data points using Hadoop.
...
...