• Corpus ID: 16861557

Federated Learning of Deep Networks using Model Averaging

  title={Federated Learning of Deep Networks using Model Averaging},
  author={H. B. McMahan and Eider Moore and Daniel Ramage and Blaise Ag{\"u}era y Arcas},
Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. [] Key Method We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks that proves robust to the unbalanced and non-IID data distributions that naturally arise.

Figures and Tables from this paper

Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning
It is shown that a distributed, federated, or decentralized deep learning approach is fundamentally broken and does not protect the training sets of honest participants.
Federated Learning: Strategies for Improving Communication Efficiency
Two ways to reduce the uplink communication costs are proposed: structured updates, where the user directly learns an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, which learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling.
FiT: Parameter Efficient Few-shot Transfer Learning for Personalized and Federated Image Classification
FiLM Transfer (FIT) is developed which fulfills requirements in the image classification setting and achieves better classification accuracy than the state-of-the-art Big Transfer algorithm at low-shot and on the challenging VTAB-1k benchmark, with fewer than 1% of the updateable parameters.
Network Update Compression for Federated Learning
Experiments on convolutional neural network (CNN) models showed the proposed model can significantly reduce the uplink communication cost in federated learning while preserving reasonable accuracy.
Fidel: Reconstructing Private Training Samples from Weight Updates in Federated Learning
It is shown how to recover on average twenty out of thirty private data samples from a client’s model update employing a fully connected neural network with very little computational resources required and over thirteen out of twenty samples can be recovered from a convolutional neural network update.
Decentralized Deep Learning with Arbitrary Communication Compression
The use of communication compression in the decentralized training context achieves linear speedup in the number of workers and supports higher compression than previous state-of-the art methods.
Evaluating the Communication Efficiency in Federated Learning Algorithms
This research begins with the fundamentals of FL, and then it highlights the recent FL algorithms and evaluates their communication efficiency with detailed comparisons, and proposes a set of solutions to alleviate the existing FL problems from a communication perspective and a privacy perspective.
Accelerating DNN Training in Wireless Federated Edge Learning Systems
This work considers a newly-emerged framework, namely federated edge learning, to aggregate local learning updates at the network edge in lieu of users’ raw data to accelerate the training process and recommends that the proposed algorithm is applicable in more general systems.
Crowdlearning: Crowded Deep Learning with Data Privacy
This paper proposes a novel idea - Crowdlearning, which decentralizes the heavy- load training procedure and deploys the training into a crowd of computation-restricted mobile devices who generate the training data, and proposes SliceNet, which ensures mobile devices can afford the computation cost and simultaneously minimize the total communication cost.
Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints
Closed FL (CFL), a novel federated multitask learning (FMTL) framework, which exploits geometric properties of the FL loss surface to group the client population into clusters with jointly trainable data distributions, and comes with strong mathematical guarantees on the clustering quality.


Privacy-preserving deep learning
  • R. Shokri, Vitaly Shmatikov
  • Computer Science
    2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • 2015
This paper presents a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets, and exploits the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously.
Large Scale Distributed Deep Networks
This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Dropout: a simple way to prevent neural networks from overfitting
It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
Distributed Learning, Communication Complexity and Privacy
General upper and lower bounds on the amount of communication needed to learn well are provided, showing that in addition to VC- dimension and covering number, quantities such as the teaching-dimension and mistake-bound of a class play an important role.
Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures
A new class of model inversion attack is developed that exploits confidence values revealed along with predictions and is able to estimate whether a respondent in a lifestyle survey admitted to cheating on their significant other and recover recognizable images of people's faces given only their name.
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Adam: A Method for Stochastic Optimization
This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.
Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss
A communication-efficient distributed algorithm to minimize the overall empirical loss, which is the average of the local empirical losses of the distributed computing system, based on an inexact damped Newton method.