Time-Correlated Sparsification for Communication-Efficient Federated Learning

@article{Ozfatura2021TimeCorrelatedSF,
  title={Time-Correlated Sparsification for Communication-Efficient Federated Learning},
  author={Emre Ozfatura and Kerem Ozfatura and Deniz G{\"u}nd{\"u}z},
  journal={2021 IEEE International Symposium on Information Theory (ISIT)},
  year={2021},
  pages={461-466}
}
Federated learning (FL) enables multiple clients to collaboratively train a shared model, with the help of a parameter server (PS), without disclosing their local datasets. However, due to the increasing size of the trained models, the communication load due to the iterative exchanges between the clients and the PS often becomes a bottleneck in the performance. Sparse communication is often employed to reduce the communication load, where only a small subset of the model updates are… 

Figures and Tables from this paper

Time-Correlated Sparsification for Efficient Over-the-Air Model Aggregation in Wireless Federated Learning
TLDR
This work proposes time-correlated sparsification with hybrid aggregation (TCS-H) for communication-efficient FEEL, which exploits jointly the power of model compression and over-the-air computation.
Private Federated Submodel Learning with Sparsification
TLDR
A novel scheme which privately reads from and writes to arbitrary parameters of any given submodel, without revealing the submodel index, values of the updates, or the coordinates of the sparse updates, to databases is proposed.
Leveraging Spatial and Temporal Correlations in Sparsified Mean Estimation
TLDR
This work studies the problem of estimating at a central server the mean of a set of vectors distributed across several nodes (one vector per node) and provides an analysis of the resulting estimation error as well as experiments, which show that the estimators consistently outperform more sophisticated and expensive sparsification methods.
Genetic CFL: Hyperparameter Optimization in Clustered Federated Learning
TLDR
This work proposes a novel hybrid algorithm, namely, genetic clustered FL (Genetic CFL), that clusters edge devices based on the training hyperparameters and genetically modifies the parameters clusterwise.
On the Necessity of Aligning Gradients for Wireless Federated Learning
TLDR
It is shown that alignment of gradients for wireless FL is not always necessary for convergence, and non-convex loss functions are considered, and conditions under which misaligned wireless gradient aggregation still converges to a stationary point are derived.
Performance-Oriented Design for Intelligent Reflecting Surface Assisted Federated Learning
TLDR
A performanceoriented design scheme that directly minimizes the optimality gap of the loss function is proposed to accelerate the convergence of AirComp based FL and adopts the block coordinate descent (BCD) method to tackle the highly-intractable problem.
Multi-Edge Server-Assisted Dynamic Federated Learning with an Optimized Floating Aggregation Point
TLDR
A network-aware CE-FL which aims to adaptively optimize all the network elements via tuning their contribution to the learning process, which turns out to be a non-convex mixed integer problem.
Federated Learning in Edge Computing: A Systematic Survey
TLDR
A systematic survey of the literature on the implementation of FL in EC environments with a taxonomy to identify advanced solutions and other open problems is provided to help researchers better understand the connection between FL and EC enabling technologies and concepts.
Distributed Learning in Wireless Networks: Recent Progress and Future Challenges
TLDR
This paper provides a holistic set of guidelines on how to deploy a broad range of distributed learning frameworks over real-world wireless communication networks, including federated learning, federated distillation, distributed inference, and multi-agent reinforcement learning.
Dynamic Scheduling for Over-the-Air Federated Edge Learning With Energy Constraints
TLDR
This work considers an over-the-air FEEL system with analog gradient aggregation, and proposes an energy-aware dynamic device scheduling algorithm to optimize the training performance within the energy constraints of devices, where both communication energy for gradient aggregation and computation energy for local training are considered.
...
...

References

SHOWING 1-10 OF 60 REFERENCES
Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data
TLDR
Sparse ternary compression (STC) is proposed, a new compression framework that is specifically designed to meet the requirements of the federated learning environment and advocate for a paradigm shift in federated optimization toward high-frequency low-bitwidth communication, in particular in the bandwidth-constrained learning environments.
Federated Learning With Quantized Global Model Updates
TLDR
A lossy FL (LFL) algorithm, in which both the global model and the local model updates are quantized before being transmitted, and it is shown that the quantization of theglobal model can actually improve the performance for non-iid data distributions.
Sparse Communication for Training Deep Networks
TLDR
This work studies several compression schemes and identifies how three key parameters affect the performance ofynchronous stochastic gradient descent and introduces a simple sparsification scheme, random-block sparsifiers, that reduces communication while keeping the performance close to standard SGD.
Sparsified SGD with Memory
TLDR
This work analyzes Stochastic Gradient Descent with k-sparsification or compression (for instance top-k or random-k) and shows that this scheme converges at the same rate as vanilla SGD when equipped with error compensation.
Adaptive Federated Learning in Resource Constrained Edge Computing Systems
TLDR
This paper analyzes the convergence bound of distributed gradient descent from a theoretical point of view, and proposes a control algorithm that determines the best tradeoff between local update and global parameter aggregation to minimize the loss function under a given resource budget.
Understanding Top-k Sparsification in Distributed Deep Learning
TLDR
The property of gradient distribution is exploited to propose an approximate top-$k$ selection algorithm, which is computing-efficient for GPUs, to improve the scaling efficiency of TopK-SGD by significantly reducing the computing overhead.
FedADC: Accelerated Federated Learning with Drift Control
TLDR
Federated learning has become de facto framework for collaborative learning among edge devices with privacy concern and it is shown that it is possible to address both problems using a single strategy without any major alteration to the FL framework, or introducing additional computation and communication load.
Broadband Analog Aggregation for Low-Latency Federated Edge Learning
TLDR
This work designs a low-latency multi-access scheme for edge learning based on a popular privacy-preserving framework, federated edge learning (FEEL), and derives two tradeoffs between communication-and-learning metrics, which are useful for network planning and optimization.
Federated Learning Over Wireless Fading Channels
TLDR
Results show clear advantages for the proposed analog over-the-air DSGD scheme, which suggests that learning and communication algorithms should be designed jointly to achieve the best end-to-end performance in machine learning applications at the wireless edge.
Hierarchical Federated Learning ACROSS Heterogeneous Cellular Networks
TLDR
Small cell base stations are introduced orchestrating FEEL among MUs within their cells, and periodically exchanging model updates with the MBS for global consensus, and it is shown that this hierarchical federated learning (HFL) scheme significantly reduces the communication latency without sacrificing the accuracy.
...
...