FedBalancer: data and pace control for efficient federated learning on heterogeneous clients

  title={FedBalancer: data and pace control for efficient federated learning on heterogeneous clients},
  author={Jaemin Shin and Yuanchun Li and Yunxin Liu and Sung-Ju Lee},
  journal={Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services},
  • Jaemin Shin, Yuanchun Li, Sung-Ju Lee
  • Published 5 January 2022
  • Computer Science
  • Proceedings of the 20th Annual International Conference on Mobile Systems, Applications and Services
Federated Learning (FL) trains a machine learning model on distributed clients without exposing individual data. Unlike centralized training that is usually based on carefully-organized data, FL deals with on-device data that are often unfiltered and imbalanced. As a result, conventional FL training protocol that treats all data equally leads to a waste of local computational resources and slows down the global learning process. To this end, we propose FedBalancer, a systematic FL framework… 

Figures and Tables from this paper


Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data
The first empirical study to characterize the impacts of heterogeneity in Federated learning and build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration, which suggests that FL algorithm designers consider necessary heterogeneity during the evaluation.
Oort: Efficient Federated Learning via Guided Participant Selection
Oort improves time-to-accuracy performance in model training, and prioritizes the use of those clients who have both data that offers the greatest utility in improving model accuracy and the capability to run training quickly, to enable FL developers to interpret their results in model testing.
SmartPC: Hierarchical Pace Control in Real-Time Federated Learning System
SmartPC is proposed, a hierarchical online pace control framework for Federated Learning that balances the training time and model accuracy in an energy-efficient manner and performs extensive experiments to evaluate it.
LEAF: A Benchmark for Federated Settings
LEAF is proposed, a modular benchmarking framework for learning in federated settings that includes a suite of open-source federated datasets, a rigorous evaluation framework, and a set of reference implementations, all geared towards capturing the obstacles and intricacies of practical federated environments.
Deep Learning Face Attributes in the Wild
A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.
Dynamic Sample Selection for Federated Learning with Heterogeneous Data in Fog Computing
This paper presents a dynamic sample selection optimization algorithm, FedSS, to tackle heterogeneous data in federated learning, and shows that the use of dynamic sampling methods can effectively improve the convergence speed with heterogeneousData, and keep computational costs low while achieving the desired accuracy.
Federated Optimization in Heterogeneous Networks
This work introduces a framework, FedProx, to tackle heterogeneity in federated networks, and provides convergence guarantees for this framework when learning over data from non-identical distributions (statistical heterogeneity), and while adhering to device-level systems constraints by allowing each participating device to perform a variable amount of work.
Not All Samples Are Created Equal: Deep Learning with Importance Sampling
A principled importance sampling scheme is proposed that focuses computation on "informative" examples, and reduces the variance of the stochastic gradients during training, and derives a tractable upper bound to the per-sample gradient norm.
Communication-Efficient Learning of Deep Networks from Decentralized Data
This work presents a practical method for the federated learning of deep networks based on iterative model averaging, and conducts an extensive empirical evaluation, considering five different model architectures and four datasets.
Variance Reduction in SGD by Distributed Importance Sampling
This work proposes a framework for distributing deep learning in which one set of workers search for the most informative examples in parallel while a single worker updates the model on examples selected by importance sampling, which leads the model to update using an unbiased estimate of the gradient.