• Corpus ID: 59553565

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

@inproceedings{Koloskova2019DecentralizedSO,
  title={Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication},
  author={Anastasia Koloskova and Sebastian U. Stich and Martin Jaggi},
  booktitle={ICML},
  year={2019}
}
We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task) being distributed over $n$ machines that can only communicate to their neighbors on a fixed communication graph. To reduce the communication bottleneck, the nodes compress (e.g. quantize or sparsify) their model updates. We cover both unbiased and biased compression operators with quality denoted by $\omega \leq 1$ ($\omega=1$ meaning no compression). We (i) propose a… 
Communication-Efficient Algorithms for Decentralized Optimization Over Directed Graphs
TLDR
These schemes build upon the observation that compressing the messages via sparsification implicitly alters column-stochasticity of the mixing matrices of the directed network, a property that plays an important role in establishing convergence results for decentralized learning tasks.
On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization
TLDR
This work shows that having gradient iterations with constant step size enables convergence to within $\epsilon$ of the optimal value for smooth non-convex objectives satisfying Polyak-Łojasiewicz condition, and this result also holds for smooth strongly convex objectives.
Stochastic Distributed Learning with Gradient Quantization and Variance Reduction
We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication
Communication-efficient Decentralized Local SGD over Undirected Networks
TLDR
The main results show that by using only $R=\Omega(n)$ communication rounds, one can achieve an error that scales as $O({1}/{nT})$, where the number of communication rounds is independent of $T$ and only depends on thenumber of agents.
An Optimal Algorithm for Decentralized Finite Sum Optimization
TLDR
A lower bound of complexity is given to show that ADFS is optimal among decentralized algorithms, which uses local stochastic proximal updates and decentralized communications between nodes to derive ADFS.
Decentralized Optimization on Time-Varying Directed Graphs Under Communication Constraints
TLDR
A communication-efficient algorithm for decentralized convex optimization that rely on sparsification of local updates exchanged between neighboring agents in the network that achieves convergence rate in the considered settings.
An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums
TLDR
A theoretical analysis based on a novel augmented graph approach combined with a precise evaluation of synchronization times and an extension of the accelerated proximal coordinate gradient algorithm to arbitrary sampling is provided.
Quantized Push-sum for Gossip and Decentralized Optimization over Directed Graphs
TLDR
This work proposes the quantized decentralized stochastic learning algorithm over directed graphs that is based on the push-sum algorithm in decentralized consensus optimization and proves that this algorithm achieves the same convergence rates of the decentralized Stochastic Learning algorithm with exact-communication for both convex and non-convex losses.
Robust Distributed Accelerated Stochastic Gradient Methods for Multi-Agent Networks
TLDR
A framework which allows to choose the stepsize and the momentum parameters of these algorithms in a way to optimize performance by systematically trading off the bias, variance, robustness to gradient noise and dependence to network effects is developed.
On the Communication Latency of Wireless Decentralized Learning
TLDR
The communication delay for a single round of exchanging gradients on all the links throughout the network scales as $\mathcal{O}\left(\frac{n^{2-3\beta}}{\beta\log n}\right)$, increasing (at different rates) with both the number of nodes and the gradient exchange threshold distance.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 65 REFERENCES
Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks
TLDR
The efficiency of MSDA against state-of-the-art methods for two problems: least-squares regression and classification by logistic regression is verified.
Communication-efficient algorithms for decentralized and stochastic optimization
TLDR
A new class of decentralized primal–dual type algorithms, namely the decentralized communication sliding (DCS) methods, which can skip the inter-node communications while agents solve the primal subproblems iteratively through linearizations of their local objective functions are presented.
Randomized gossip algorithms
TLDR
This work analyzes the averaging problem under the gossip constraint for an arbitrary network graph, and finds that the averaging time of a gossip algorithm depends on the second largest eigenvalue of a doubly stochastic matrix characterizing the algorithm.
Communication Compression for Decentralized Training
TLDR
This paper develops a framework of quantized, decentralized training and proposes two different strategies, which are called extrapolation compression and difference compression, which outperforms the best of merely decentralized and merely quantized algorithm significantly for networks with high latency and low bandwidth.
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
TLDR
This work develops and analyze distributed algorithms based on dual subgradient averaging and provides sharp bounds on their convergence rates as a function of the network size and topology, and shows that the number of iterations required by the algorithm scales inversely in the spectral gap of thenetwork.
Distributed Average Consensus With Dithered Quantization
TLDR
An upper bound on the mean-square-error performance of the probabilistically quantized distributed averaging (PQDA) is derived and it is shown that the convergence of the PQDA is monotonic by studying the evolution of the minimum-length interval containing the node values.
Optimal Algorithms for Non-Smooth Distributed Optimization in Networks
TLDR
The error due to limits in communication resources decreases at a fast rate even in the case of non-strongly-convex objective functions, and the first optimal first-order decentralized algorithm called multi-step primal-dual (MSPD) and its corresponding optimal convergence rate are provided.
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
TLDR
This paper studies a D-PSGD algorithm and provides the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent.
Fast Distributed Gradient Methods
TLDR
This work proposes two fast distributed gradient algorithms based on the centralized Nesterov gradient algorithm and establishes their convergence rates in terms of the per-node communications K and theper-node gradient evaluations k.
Quantized Decentralized Consensus Optimization
TLDR
This work proposes the Quantized Decentralized Gradient Descent (QDGD) algorithm, in which nodes update their local decision variables by combining the quantized information received from their neighbors with their local information, and proves that under standard strong convexity and smoothness assumptions for local cost functions, QDGD achieves a vanishing mean solution error.
...
1
2
3
4
5
...