• Corpus ID: 202778044

Order Optimal One-Shot Distributed Learning

@inproceedings{Sharifnassab2019OrderOO,
  title={Order Optimal One-Shot Distributed Learning},
  author={Arsalan Sharifnassab and Saber Salehkaleybar and S. Jamaloddin Golestani},
  booktitle={NeurIPS},
  year={2019}
}
We consider distributed statistical optimization in one-shot setting, where there are $m$ machines each observing $n$ i.i.d samples. Based on its observed samples, each machine then sends an $O(\log(mn))$-length message to a server, at which a parameter minimizing an expected loss is to be estimated. We propose an algorithm called Multi-Resolution Estimator (MRE) whose expected error is no larger than $\tilde{O}( m^{-1/\max(d,2)} n^{-1/2})$, where $d$ is the dimension of the parameter space… 

Figures from this paper

One-Shot Federated Learning: Theoretical Limits and Algorithms to Achieve Them

TLDR
An estimator is proposed, which is called Multi-Resolution Estimator (MRE), whose expected error (when $B\ge\log mn$) meets the aforementioned lower bound up to poly-logarithmic factors, and is thereby order optimal.

LOSP: Overlap Synchronization Parallel With Local Compensation for Fast Distributed Training

TLDR
A new method named LOSP is proposed by introducing local compensation to the previous synchronization mechanism, which mitigates adverse effects caused by the overlapping synchronization, and theoretically proves that LOSP preserves the same convergence rate as the sequential SGD for non-convex problems.

Distilled One-Shot Federated Learning

TLDR
The proposed Distilled One-Shot Federated Learning, which reduces the number of communication rounds required to train a performant model to only one, and represents a new direction orthogonal to previous work, towards weight-less and gradient-less federated learning.

FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning

TLDR
The proposed FedDM reduces communication rounds and improves model quality by transmitting more informative and smaller synthesized data compared with unwieldy model weights, and results show that the method can outperform other FL counterparts in terms of efficiency and model performance.

Non-IID Distributed Learning with Optimal Mixture Weights

. Distributed learning can well solve the problem of training model with large-scale data, which has attracted much attention in recent years. However, most existing distributed learning algorithms

References

SHOWING 1-10 OF 19 REFERENCES

One-Shot Federated Learning: Theoretical Limits and Algorithms to Achieve Them

TLDR
An estimator is proposed, which is called Multi-Resolution Estimator (MRE), whose expected error (when $B\ge\log mn$) meets the aforementioned lower bound up to poly-logarithmic factors, and is thereby order optimal.

Communication lower bounds for statistical estimation problems via a distributed data processing inequality

TLDR
A distributed data processing inequality is proved, as a generalization of usual data processing inequalities, which might be of independent interest and useful for other problems.

Communication-efficient algorithms for statistical optimization

TLDR
A sharp analysis of this average mixture algorithm is provided, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as O(N-1 + (N/m)-2).

Distributed Statistical Machine Learning in Adversarial Settings: Byzantine Gradient Descent

  • Yudong ChenLili SuJiaming Xu
  • Computer Science
    Abstracts of the 2018 ACM International Conference on Measurement and Modeling of Computer Systems
  • 2018
TLDR
This paper proposes a simple variant of the classical gradient descent method and proves that the aggregated gradient, as a function of model parameter, converges uniformly to the true gradient function.

Communication-Efficient Distributed Learning of Discrete Distributions

TLDR
This work designs distributed learning algorithms that achieve significantly better communication guarantees than the naive ones, and obtains tight upper and lower bounds in several regimes of distribution learning.

Distributed Semi-supervised Learning with Kernel Ridge Regression

TLDR
This paper provides error analysis for distributed semi-supervised learning with kernel ridge regression (DSKRR) based on a divide-and-conquer strategy and shows that unlabeled data play important roles in reducing the distributed error and enlarging the number of data subsets in DSKRR.

Communication-Efficient Learning of Deep Networks from Decentralized Data

TLDR
This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.

Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling

TLDR
This work develops and analyze distributed algorithms based on dual subgradient averaging and provides sharp bounds on their convergence rates as a function of the network size and topology, and shows that the number of iterations required by the algorithm scales inversely in the spectral gap of thenetwork.

Deep learning with Elastic Averaging SGD

TLDR
Experiments demonstrate that the new algorithm accelerates the training of deep architectures compared to DOWNPOUR and other common baseline approaches and furthermore is very communication efficient.

Communication-Efficient Distributed Statistical Inference

TLDR
CSL provides a communication-efficient surrogate to the global likelihood that can be used for low-dimensional estimation, high-dimensional regularized estimation, and Bayesian inference and significantly improves the computational efficiency of Markov chain Monte Carlo algorithms even in a nondistributed setting.