• Corpus ID: 238583689

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

  title={ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training},
  author={Hui-Po Wang and Sebastian U. Stich and Yang He and Mario Fritz},
Federated learning is a powerful distributed learning scheme that allows numerous edge devices to collaboratively train a model without sharing their data. However, training is resource-intensive for edge devices, and limited network bandwidth is often the main bottleneck. Prior work often over-comes the constraints by condensing the models or messages into compact formats, e.g., by gradient compression or distillation. In contrast, we propose ProgFed, the first progressive training framework… 

FedCliP: Federated Learning with Client Pruning

A valid client determination approximation framework based on the reliability score with Gaussian Scale Mixture (GSM) modeling for federated participating clients pruning and a communication efficient client pruning training method in the FL scenario is proposed.

FedNet2Net: Saving Communication and Computations in Federated Learning with Model Growing

A novel scheme based on the notion of “model growing” is proposed and is shown to achieve substantial reduction in communication and client computation while achieving comparable accuracy when compared to the current most effective strategies.

FedTune: A Deep Dive into Efficient Federated Fine-Tuning with Pre-trained Transformers

It is demonstrated that the fine-tuned Transformers achieve extraordinary performance on FL, and that the lightweight tuning method facilitates a fast convergence rate and low communication costs.

Accelerated Federated Learning with Decoupled Adaptive Optimization

A momentum decoupling adaptive optimization method is developed to fully utilize the global momentum on each local iteration and accelerate the training convergence and overcome the possible inconsistency caused by adaptive optimization methods.

Resource-Efficient Federated Learning With Non-IID Data: An Auction Theoretic Approach

This work proposes a resource-efficient method for training an FL-based application with non-IID data, effectively minimizing cost through an auction approach and mitigating quality degradation through data sharing, and demonstrates that the profitability of the stakeholders can be maximized using the proposed auction procedure.

Federated Learning for Inference at Anytime and Anywhere

This simple framework provides fast and accurate FL while supporting heterogenous device capabilities, efficient personalization, and scalable-cost anytime inference.



CosSGD: Communication-Efficient Federated Learning with a Simple Cosine-Based Quantization

This work proposes a simple cosine-based nonlinear quantization and achieves impressive results in compressing round-trip communication costs and is highly suitable for federated learning problems since it has low computational complexity and requires only a little additional data to recover the compressed information.

Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge

This work reformulates FL as a group knowledge transfer training algorithm, called FedGKT, which designs a variant of the alternating minimization approach to train small CNNs on edge nodes and periodically transfer their knowledge by knowledge distillation to a large server-side CNN.

Communication-Efficient Learning of Deep Networks from Decentralized Data

This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.

Ensemble Distillation for Robust Model Fusion in Federated Learning

This work proposes ensemble distillation for model fusion, i.e. training the central classifier through unlabeled data on the outputs of the models from the clients, which allows flexible aggregation over heterogeneous client models that can differ e.g. in size, numerical precision or structure.

Bidirectional compression in heterogeneous settings for distributed or federated learning with partial participation: tight convergence guarantees

Artemis, a framework to tackle the problem of learning in a distributed or federated setting with communication constraints and device partial participation, provides fast rates of convergence under weak assumptions on the stochastic gradients and achieves a lower bound of the asymptotic variance that highlights practical limits of compression.

FedMD: Heterogenous Federated Learning via Model Distillation

This work uses transfer learning and knowledge distillation to develop a universal framework that enables federated learning when each agent owns not only their private data, but also uniquely designed models.

Sparsified SGD with Memory

This work analyzes Stochastic Gradient Descent with k-sparsification or compression (for instance top-k or random-k) and shows that this scheme converges at the same rate as vanilla SGD when equipped with error compensation.

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

This paper finds 99.9% of the gradient exchange in distributed SGD is redundant, and proposes Deep Gradient Compression (DGC) to greatly reduce the communication bandwidth, which enables large-scale distributed training on inexpensive commodity 1Gbps Ethernet and facilitates distributedTraining on mobile.

Federated Optimization in Heterogeneous Networks

This work introduces a framework, FedProx, to tackle heterogeneity in federated networks, and provides convergence guarantees for this framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work.

Decoupled Greedy Learning of CNNs

Decoupled Greedy Learning is considered, based on a greedy relaxation of the joint training objective, recently shown to be effective in the context of Convolutional Neural Networks (CNNs) on large-scale image classification, and it is shown that it can lead to better generalization than sequential greedy optimization.