• Corpus ID: 14677731

Distributed Mini-Batch SDCA

@article{Takc2015DistributedMS,
  title={Distributed Mini-Batch SDCA},
  author={Martin Tak{\'a}c and Peter Richt{\'a}rik and Nathan Srebro},
  journal={ArXiv},
  year={2015},
  volume={abs/1507.08322}
}
We present an improved analysis of mini-batched stochastic dual coordinate ascent for regularized empirical loss minimization (i.e. SVM and SVM-type objectives). Our analysis allows for flexible sampling schemes, including where data is distribute across machines, and combines a dependence on the smoothness of the loss and/or the data spread (measured through the spectral norm). 

Tables from this paper

A General Distributed Dual Coordinate Optimization Framework for Regularized Loss Minimization
TLDR
This paper introduces a novel distributed dual formulation for regularized loss minimization problems that can directly handle data parallelism in the distributed setting and develops the accelerated version of DADM (Acc-DADM), which significantly improves the previous state-of-the-art distributed dual coordinate optimization algorithms.
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
TLDR
This work presents a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing, and extends the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso.
Dual Free Adaptive Minibatch SDCA for Empirical Risk Minimization
TLDR
The novelty of the approach is that the coordinates to update at each iteration are selected non-uniformly from an adaptive probability distribution, and this extends the previously mentioned work which only allowed for a uniform selection of "dual" coordinates from a fixed probability distribution.
Gradient Diversity Empowers Distributed Learning
TLDR
It is proved that on problems with high gradient diversity, mini-batch SGD is amenable to better speedups, while maintaining the generalization performance of serial (one sample) SGD.
Gradient Diversity Empowers Distributed Learning: Convergence and Stability of Mini-batch SGD
TLDR
It is proved that on problems with high gradient diversity, mini-batch SGD is amenable to better speedups, while maintaining the generalization performance of serial (one sample) SGD.
Parallelizing Stochastic Gradient Descent for Least Squares Regression: Mini-batching, Averaging, and Model Misspecification
TLDR
A novel analysis is developed in bounding these operators to characterize the excess risk of communication efficient parallelization schemes such as model-averaging/parameter mixing methods, which are of broader interest in analyzing computational aspects of stochastic approximation.
An accelerated communication-efficient primal-dual optimization framework for structured machine learning
TLDR
An accelerated variant of CoCoA+ is proposed and shown to possess a convergence rate of in terms of reducing suboptimality, and the results of numerical experiments are provided to show that acceleration can lead to significant performance gains.
Distributed Asynchronous Dual-Free Stochastic Dual Coordinate Ascent
TLDR
This paper proposes Distributed Asynchronous Dual Free Coordinate Ascent method (dis-dfSDCA), and proves that it has linear convergence rate when the problem is convex and smooth and the stale gradient update is common in asynchronous method.
Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging
TLDR
This work presents the first tight non-asymptotic generalization error bounds for these schemes for the stochastic approximation problem of least squares regression, and establishes a precise problem-dependent extent to which mini-batching can be used to yield provable near-linear parallelization speedups over SGD with batch size one.
Primal-Dual Rates and Certificates
TLDR
An algorithm-independent framework to equip existing optimization methods with primal-dual certificates is proposed, which provides efficiently computable duality gaps which are globally defined, without modifying the original problems in the region of interest.
...
...

References

SHOWING 1-10 OF 31 REFERENCES
Mini-Batch Primal and Dual Methods for SVMs
TLDR
It is shown that the same quantity, the spectral norm of the data, controls the parallelization speedup obtained for both primal stochastic subgradient descent (SGD) and Stochastic dual coordinate ascent (SCDA) methods and is used to derive novel variants of mini-batched SDCA.
Stochastic dual coordinate ascent methods for regularized loss
TLDR
A new analysis of Stochastic Dual Coordinate Ascent (SDCA) is presented showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD.
Better Mini-Batch Algorithms via Accelerated Gradient Methods
TLDR
A novel analysis is provided, which shows how standard gradient methods may sometimes be insufficient to obtain a significant speed-up and a novel accelerated gradient algorithm is proposed, which deals with this deficiency, enjoys a uniformly superior guarantee and works well in practice.
Fast distributed coordinate descent for non-strongly convex losses
TLDR
An efficient distributed randomized coordinate descent method for minimizing regularized non-strongly convex loss functions and is capable of solving a (synthetic) LASSO optimization problem with 50 billion variables.
Adding vs. Averaging in Distributed Primal-Dual Optimization
TLDR
A novel generalization of the recent communication-efficient primal-dual framework (COCOA) for distributed optimization, which allows for additive combination of local updates to the global parameters at each iteration, whereas previous schemes with convergence guarantees only allow conservative averaging.
Distributed stochastic optimization and learning
  • O. Shamir, Nathan Srebro
  • Computer Science
    2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2014
TLDR
It is shown how the best known guarantees are obtained by an accelerated mini-batched SGD approach, and the runtime and sample costs of the approach with those of other distributed optimization algorithms are compared.
Communication-Efficient Distributed Dual Coordinate Ascent
TLDR
A communication-efficient framework that uses local computation in a primal-dual setting to dramatically reduce the amount of necessary communication is proposed, and a strong convergence rate analysis is provided for this class of algorithms.
Parallel Coordinate Descent for L1-Regularized Loss Minimization
TLDR
This work proves convergence bounds for Shotgun which predict linear speedups, up to a problem-dependent limit, and presents a comprehensive empirical study of Shotgun for Lasso and sparse logistic regression.
Distributed Coordinate Descent Method for Learning with Big Data
TLDR
This paper develops and analyzes Hydra: HYbriD cooRdinAte descent method for solving loss minimization problems with big data, and gives bounds on the number of iterations sufficient to approximately solve the problem with high probability.
Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting
TLDR
It is proved that as long as b is below a certain threshold, the authors can reach any predefined accuracy with less overall work than without mini-batching, and is suitable for further acceleration by parallelization.
...
...