Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better

@inproceedings{Bibikar2021FederatedDS,
  title={Federated Dynamic Sparse Training: Computing Less, Communicating Less, Yet Learning Better},
  author={Sameer Bibikar and Haris Vikalo and Zhangyang Wang and Xiaohan Chen},
  booktitle={AAAI Conference on Artificial Intelligence},
  year={2021}
}
Federated learning (FL) enables distribution of machine learning workloads from the cloud to resource-limited edge devices. Unfortunately, current deep networks remain not only too compute-heavy for inference and training on edge devices, but also too large for communicating updates over bandwidth-constrained networks. In this paper, we develop, implement, and experimentally validate a novel FL framework termed Federated Dynamic Sparse Training (FedDST) by which complex neural networks can be… 

Figures and Tables from this paper

Federated Sparse Training: Lottery Aware Model Compression for Resource Constrained Edge

This paper proposes federated lottery aware sparsity hunting (FLASH), a unified sparse learning framework to make the server win a lottery in terms of a sparse sub-model, which can greatly improve performance under highly resource-limited client settings.

FedTiny: Pruned Federated Learning Towards Specialized Tiny Models

FedTiny is developed, a novel distributed pruning framework for FL, to obtain specialized tiny models for memoryand computing-constrained participating devices with confidential local data, which outperforms state-of-the-art baseline approaches when compressing deep models to extremely sparse tiny models.

Lottery Aware Sparsity Hunting: Enabling Federated Learning on Resource-Limited Edge

This paper proposes federated lottery aware sparsity hunting (FLASH), a unified sparse learning framework to make the server win a lottery in terms of yielding a sparse sub-model, able to maintain classi⬁cation performance under highly resource-limited client settings.

Sparse Random Networks for Communication-Efficient Federated Learning

This work proposes a radically different approach to federated learning that does not update the weights at all, and freezes the weight updates at their initial random values and learns how to sparsify the random network for the best performance.

Intrinsic Gradient Compression for Scalable and Efficient Federated Learning

This work proves a general correspondence between the notions of intrinsic dimension and gradient compressibility, and shows that a family of low-bandwidth federated learning algorithms, which are called intrinsic gradient compression algorithms, naturally emerges from this correspondence.

Centaur: Federated Learning for Constrained Edge Devices

Centaur, a multitier FL framework, enabling ultra-constrained devices to efficiently participate in FL on large neural nets by combining a data selection scheme to choose a portion of samples that accelerates the learning and a partition-based training algorithm that integrates both constrained and powerful devices owned by the same user.

Federated Progressive Sparsification (Purge, Merge, Tune)+

This work develops FedSparsify, a sparsification strategy based on progressive weight magnitude pruning that can reach a tenth of the size of the original model with the same or better accuracy compared to existing pruning and non-pruning baselines.

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

This work draws a unique connection between sparse neural network training and deep ensembles, yielding a novel efficient ensemble learning framework called FreeT ickets, which surpasses the dense baseline in all the following criteria: prediction accuracy, uncertainty estimation, out-of-distribution (OoD) robustness, as well as efficiency for both training and inference.

Towards Sparsified Federated Neuroimaging Models via Weight Pruning

It is demonstrated that models with high sparsity are less susceptible to membership inference attacks, a type of privacy attack, and proposed FedSparsify, which performs model pruning during federated training, is proposed.

D EEP E NSEMBLING WITH N O O VERHEAD FOR EITHER T RAINING OR T ESTING : T HE A LL -R OUND B LESSINGS OF D YNAMIC S PARSITY

This work draws a unique connection between sparse neural network training and deep ensembles, yielding a novel efficient ensemble learning framework called FreeT ickets, which surpasses the dense baseline in all the following criteria: prediction accuracy, uncertainty estimation, out-of-distribution (OoD) robustness, as well as efficiency for both training and inference.

References

SHOWING 1-10 OF 40 REFERENCES

Model Pruning Enables Efficient Federated Learning on Edge Devices

PruneFL is proposed--a novel FL approach with adaptive and distributed parameter pruning, which adapts the model size during FL to reduce both communication and computation overhead and minimize the overall training time, while maintaining a similar accuracy as the original model.

Federated Learning: Strategies for Improving Communication Efficiency

Two ways to reduce the uplink communication costs are proposed: structured updates, where the user directly learns an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, which learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling.

Federated Optimization in Heterogeneous Networks

This work introduces a framework, FedProx, to tackle heterogeneity in federated networks, and provides convergence guarantees for this framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work.

Federated Learning with Matched Averaging

This work proposes Federated matched averaging (FedMA) algorithm designed for federated learning of modern neural network architectures e.g. convolutional neural networks (CNNs) and LSTMs and indicates that FedMA outperforms popular state-of-the-art federatedLearning algorithms on deep CNN and L STM architectures trained on real world datasets, while improving the communication efficiency.

Communication-Efficient Federated Learning via Optimal Client Sampling

This work proposes a novel, simple and efficient way of updating the central model in communication-constrained settings by determining the optimal client sampling policy by modeling the progression of clients’ weights by an Ornstein-Uhlenbeck process.

Communication-Efficient Learning of Deep Networks from Decentralized Data

This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.

Think Locally, Act Globally: Federated Learning with Local and Global Representations

A new federated learning algorithm is proposed that jointly learns compact local representations on each device and a global model across all devices, which helps to keep device data private and enable communication-efficient training while retaining performance.

Adaptive Federated Optimization

This work proposes federated versions of adaptive optimizers, including Adagrad, Adam, and Yogi, and analyzes their convergence in the presence of heterogeneous data for general nonconvex settings to highlight the interplay between client heterogeneity and communication efficiency.

Advances and Open Problems in Federated Learning

Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

This work draws a unique connection between sparse neural network training and deep ensembles, yielding a novel efficient ensemble learning framework called FreeT ickets, which surpasses the dense baseline in all the following criteria: prediction accuracy, uncertainty estimation, out-of-distribution (OoD) robustness, as well as efficiency for both training and inference.