• Corpus ID: 13208387

Adding vs. Averaging in Distributed Primal-Dual Optimization

@inproceedings{Ma2015AddingVA,
  title={Adding vs. Averaging in Distributed Primal-Dual Optimization},
  author={Chenxin Ma and Virginia Smith and Martin Jaggi and Michael I. Jordan and Peter Richt{\'a}rik and Martin Tak{\'a}c},
  booktitle={ICML},
  year={2015}
}
Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of the recent communication-efficient primal-dual framework (COCOA) for distributed optimization. Our framework, COCOA+, allows for additive combination of local updates to the global parameters at each iteration… 

Figures and Tables from this paper

An accelerated communication-efficient primal-dual optimization framework for structured machine learning
TLDR
An accelerated variant of CoCoA+ is proposed and shown to possess a convergence rate of in terms of reducing suboptimality, and the results of numerical experiments are provided to show that acceleration can lead to significant performance gains.
Distributed Primal-Dual Optimization for Non-uniformly Distributed Data
TLDR
This work develops a computational efficient algorithm to automatically choose the optimal weights for each machine in the primal-dual optimization framework and proposes an efficient way to estimate the duality gap of the merged update by exploiting the structure of the objective function.
DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization
TLDR
This paper works with the saddle-point formulation of large linear models with convex loss functions, and proposes a family of randomized primal-dual block coordinate algorithms that are especially suitable for asynchronous distributed implementation with parameter servers.
Distributed Optimization for Non-Strongly
TLDR
The ProxCoCoA+ method is presented, a method which represents a generalization of the CoCoA+.
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
TLDR
This work presents a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing, and extends the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso.
Distributed optimization with arbitrary local solvers
TLDR
This work presents a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods.
Communication-Efficient Distributed Primal-Dual Algorithm for Saddle Point Problem
TLDR
This paper proposes a novel communication-efficient distributed optimization framework to solve the convex-concave saddle point problem based on primal-dual methods, and provides a convergence analysis of the proposed algorithm, and extends it to address non-smooth and non-strongly convex loss functions.
Stochastic, Distributed and Federated Optimization for Machine Learning
TLDR
This work proposes novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives in distributed setting and introduces the concept of Federated Optimization/Learning, where the machine learning problems without having data stored in any centralized manner are solved.
Stochastic, Distributed and Federated Optimization for Machine Learning
TLDR
This work proposes novel variants of stochastic gradient descent with a variance reduction property that enables linear convergence for strongly convex objectives in distributed setting, and introduces the concept of Federated Optimization/Learning, where the main motivation comes from industry when handling user-generated data.
Distributed Optimization for Non-Strongly Convex Regularizers
TLDR
The ProxCoCoA+ method is presented, a method which represents a generalization of the CoCoA- algorithm and extends it to the case of general strongly convex regularizers, and two new optimization methods that combine distributed and parallel optimization techniques and achieve significant speed-ups with respect to their non-wild variants are experimentally explored and proved.
...
...

References

SHOWING 1-10 OF 45 REFERENCES
Communication-Efficient Distributed Dual Coordinate Ascent
TLDR
A communication-efficient framework that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication is proposed, and a strong convergence rate analysis is provided for this class of algorithms.
Distributed optimization with arbitrary local solvers
TLDR
This work presents a framework for distributed optimization that both allows the flexibility of arbitrary solvers to be used on each (single) machine locally and yet maintains competitive performance against other state-of-the-art special-purpose distributed methods.
Randomized Dual Coordinate Ascent with Arbitrary Sampling
TLDR
This work proposes and analyzes a novel primal-dual method (Quartz) which at every iteration samples and updates a random subset of the dual variables, chosen according to an arbitrary distribution, and generates efficient serial, parallel and distributed variants of the method.
Analysis of Distributed Stochastic Dual Coordinate Ascent
TLDR
The analysis helped by empirical studies has shown that it could yield an exponential speed-up in the convergence by increasing the number of dual updates at each iteration, which justifies the superior performances of the practical DisDCA as compared to the naive variant.
Distributed stochastic optimization and learning
  • O. Shamir, Nathan Srebro
  • Computer Science
    2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2014
TLDR
It is shown how the best known guarantees are obtained by an accelerated mini-batched SGD approach, and the runtime and sample costs of the approach with those of other distributed optimization algorithms are compared.
Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
TLDR
A distributed optimization algorithm is presented by employing a stochastic dual coordinate ascent method and an analysis of the tradeoff between computation and communication is conducted, and competitive performances are observed.
Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss
TLDR
A communication-efficient distributed algorithm to minimize the overall empirical loss, which is the average of the local empirical losses of the distributed computing system, based on an inexact damped Newton method.
Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM
TLDR
This paper proposes an efficient box-constrained quadratic optimization algorithm for distributedly training linear support vector machines (SVMs) with large data using an efficient method that requires only O(1) communication cost to ensure fast convergence.
Parallel coordinate descent methods for big data optimization
In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex
Communication-Efficient Distributed Optimization using an Approximate Newton-type Method
TLDR
A novel Newton-type method for distributed optimization, which is particularly well suited for stochastic optimization and learning problems, and which enjoys a linear rate of convergence which provably improves with the data size.
...
...