• Corpus ID: 203610177

The Non-IID Data Quagmire of Decentralized Machine Learning

@article{Hsieh2020TheND,
  title={The Non-IID Data Quagmire of Decentralized Machine Learning},
  author={Kevin Hsieh and Amar Phanishayee and Onur Mutlu and Phillip B. Gibbons},
  journal={ArXiv},
  year={2020},
  volume={abs/1910.00189}
}
Many large-scale machine learning (ML) applications need to perform decentralized learning over datasets generated at different devices and locations. Such datasets pose a significant challenge to decentralized learning because their different contexts result in significant data distribution skew across devices/locations. In this paper, we take a step toward better understanding this challenge by presenting a detailed experimental study of decentralized DNN training on a common type of data… 
D-Cliques: Compensating for Data Heterogeneity with Topology in Decentralized Federated Learning
TLDR
D-Cliques is presented, a novel topology that reduces gradient bias by grouping nodes in sparsely interconnected cliques such that the label distribution in a clique is representative of the global label distribution.
D-Cliques: Compensating NonIIDness in Decentralized Federated Learning with Topology
TLDR
D-Cliques is presented, a novel topology that reduces gradient bias by grouping nodes in interconnected cliques such that the local joint distribution in a clique is representative of the global class distribution.
FedDNA: Federated Learning with Decoupled Normalization-Layer Aggregation for Non-IID Data
TLDR
A novel decoupled parameter aggregation method called FedDNA is proposed to deal with the performance issues caused by CDCS and achieves significant performance improvement compared to the state-ofthe-art methods.
OpenFed: An Open-Source Security and Privacy Guaranteed Federated Learning Framework
  • Dengsheng Chen
  • Computer Science
    ArXiv
  • 2021
TLDR
This work presents OpenFed, an open-source software framework to simultaneously address the demands for data protection and utilization, which enables state-ofthe-art model development in low-trust environments despite limited local data availability and lays the groundwork for sustainable collaborative model development and commercial deployment.
Sparse-Push: Communication- & Energy-Efficient Decentralized Distributed Learning over Directed & Time-Varying Graphs with non-IID Datasets
TLDR
Sarse-Push is proposed, a communication efficient decentralized distributed training algorithm that supports training over peer-to-peer, directed and time-varying graph topologies and enables 466× reduction in communication with only 1% degradation in performance when training various DL models such as ResNet20 and VGG11 over CIFAR-10 dataset.
SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging
TLDR
SplitAVG method can effectively overcome the performance drops from variability in data distributions across institutions, and can be adapted to different base networks and generalized to various types of medical imaging tasks.
DESED-FL and URBAN-FL: Federated Learning Datasets for Sound Event Detection
TLDR
The results indicate that FL is a promising approach for SED, but faces challenges with divergent data distributions inherent to distributed client edge devices.
FedMix: Approximation of Mixup under Mean Augmented Federated Learning
TLDR
A new augmentation algorithm is proposed, named FedMix, which is inspired by a phenomenal yet simple data augmentation method, Mixup, but does not require local raw data to be directly shared among devices.
Low Precision Decentralized Distributed Training with Heterogeneous Data
TLDR
The proposed low precision decentralized training decreases computational complexity, memory usage, and communication cost by ∼ 4× while trading off less than a 1% accuracy for both IID and non-IID data, indicating the regularization effect of the quantization.
On Large-Cohort Training for Federated Learning
TLDR
This work explores how the number of clients sampled at each round (the cohort size) impacts the quality of the learned model and the training dynamics of federated learning algorithms.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 76 REFERENCES
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
TLDR
This paper finds 99.9% of the gradient exchange in distributed SGD is redundant, and proposes Deep Gradient Compression (DGC) to greatly reduce the communication bandwidth, which enables large-scale distributed training on inexpensive commodity 1Gbps Ethernet and facilitates distributedTraining on mobile.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
ImageNet Large Scale Visual Recognition Challenge
TLDR
The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared.
Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
  • S. Ioffe
  • Computer Science, Mathematics
    NIPS
  • 2017
TLDR
This work proposes Batch Renormalization, a simple and effective extension to ensure that the training and inference models generate the same outputs that depend on individual examples rather than the entire minibatch.
Communication-Efficient Learning of Deep Networks from Decentralized Data
TLDR
This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.
Layer normalization. CoRR
  • 2016
A Stochastic Approximation Method
Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown tot he experiment, and it is desire to find the
Gradient-based learning applied to document recognition
TLDR
This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.
D2: Decentralized Training over Decentralized Data
TLDR
D$2, a novel decentralized parallel stochastic gradient descent algorithm designed for large data variance among workers, is presented and empirically evaluated on image classification tasks where each worker has access to only the data of a limited set of labels, and significantly outperforms D-PSGD.
Federated Learning with Non-IID Data
TLDR
This work presents a strategy to improve training on non-IID data by creating a small subset of data which is globally shared between all the edge devices, and shows that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data.
...
1
2
3
4
5
...