• Corpus ID: 202778044

# Order Optimal One-Shot Distributed Learning

@inproceedings{Sharifnassab2019OrderOO,
title={Order Optimal One-Shot Distributed Learning},
author={Arsalan Sharifnassab and Saber Salehkaleybar and S. Jamaloddin Golestani},
booktitle={NeurIPS},
year={2019}
}
• Published in NeurIPS 1 November 2019
• Computer Science
We consider distributed statistical optimization in one-shot setting, where there are $m$ machines each observing $n$ i.i.d samples. Based on its observed samples, each machine then sends an $O(\log(mn))$-length message to a server, at which a parameter minimizing an expected loss is to be estimated. We propose an algorithm called Multi-Resolution Estimator (MRE) whose expected error is no larger than $\tilde{O}( m^{-1/\max(d,2)} n^{-1/2})$, where $d$ is the dimension of the parameter space…
5 Citations

## Figures from this paper

### One-Shot Federated Learning: Theoretical Limits and Algorithms to Achieve Them

• Computer Science
J. Mach. Learn. Res.
• 2021
An estimator is proposed, which is called Multi-Resolution Estimator (MRE), whose expected error (when $B\ge\log mn$) meets the aforementioned lower bound up to poly-logarithmic factors, and is thereby order optimal.

### LOSP: Overlap Synchronization Parallel With Local Compensation for Fast Distributed Training

• Computer Science
IEEE Journal on Selected Areas in Communications
• 2021
A new method named LOSP is proposed by introducing local compensation to the previous synchronization mechanism, which mitigates adverse effects caused by the overlapping synchronization, and theoretically proves that LOSP preserves the same convergence rate as the sequential SGD for non-convex problems.

### Distilled One-Shot Federated Learning

• Computer Science
ArXiv
• 2020
The proposed Distilled One-Shot Federated Learning, which reduces the number of communication rounds required to train a performant model to only one, and represents a new direction orthogonal to previous work, towards weight-less and gradient-less federated learning.

### FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning

• Computer Science
ArXiv
• 2022
The proposed FedDM reduces communication rounds and improves model quality by transmitting more informative and smaller synthesized data compared with unwieldy model weights, and results show that the method can outperform other FL counterparts in terms of efﬁciency and model performance.

### Non-IID Distributed Learning with Optimal Mixture Weights

• Computer Science
• 2022
. Distributed learning can well solve the problem of training model with large-scale data, which has attracted much attention in recent years. However, most existing distributed learning algorithms

## References

SHOWING 1-10 OF 19 REFERENCES

### One-Shot Federated Learning: Theoretical Limits and Algorithms to Achieve Them

• Computer Science
J. Mach. Learn. Res.
• 2021
An estimator is proposed, which is called Multi-Resolution Estimator (MRE), whose expected error (when $B\ge\log mn$) meets the aforementioned lower bound up to poly-logarithmic factors, and is thereby order optimal.

### Communication lower bounds for statistical estimation problems via a distributed data processing inequality

• Computer Science
STOC
• 2016
A distributed data processing inequality is proved, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

### Communication-efficient algorithms for statistical optimization

• Computer Science
2012 IEEE 51st IEEE Conference on Decision and Control (CDC)
• 2012
A sharp analysis of this average mixture algorithm is provided, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as O(N-1 + (N/m)-2).

• Computer Science
Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems
• 2018
This paper proposes a simple variant of the classical gradient descent method and proves that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.

### Communication-Efficient Distributed Learning of Discrete Distributions

• Computer Science
NIPS
• 2017
This work designs distributed learning algorithms that achieve significantly better communication guarantees than the naive ones, and obtains tight upper and lower bounds in several regimes of distribution learning.

### Distributed Semi-supervised Learning with Kernel Ridge Regression

• Computer Science
J. Mach. Learn. Res.
• 2017
This paper provides error analysis for distributed semi-supervised learning with kernel ridge regression (DSKRR) based on a divide-and-conquer strategy and shows that unlabeled data play important roles in reducing the distributed error and enlarging the number of data subsets in DSKRR.

### Communication-Efficient Learning of Deep Networks from Decentralized Data

• Computer Science
AISTATS
• 2017
This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.

### Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling

• Computer Science
IEEE Transactions on Automatic Control
• 2012
This work develops and analyze distributed algorithms based on dual subgradient averaging and provides sharp bounds on their convergence rates as a function of the network size and topology, and shows that the number of iterations required by the algorithm scales inversely in the spectral gap of thenetwork.

### Deep learning with Elastic Averaging SGD

• Computer Science
NIPS
• 2015
Experiments demonstrate that the new algorithm accelerates the training of deep architectures compared to DOWNPOUR and other common baseline approaches and furthermore is very communication efficient.

### Communication-Efficient Distributed Statistical Inference

• Computer Science
Journal of the American Statistical Association
• 2018
CSL provides a communication-efficient surrogate to the global likelihood that can be used for low-dimensional estimation, high-dimensional regularized estimation, and Bayesian inference and significantly improves the computational efficiency of Markov chain Monte Carlo algorithms even in a nondistributed setting.