Corpus ID: 235421628

Dynamic Gradient Aggregation for Federated Domain Adaptation

@article{Dimitriadis2021DynamicGA,
  title={Dynamic Gradient Aggregation for Federated Domain Adaptation},
  author={Dimitrios Dimitriadis and Ken'ichi Kumatani and Robert Gmyr and Yashesh Gaur and Sefik Emre Eskimez},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.07578}
}
In this paper, a new learning algorithm for Federated Learning (FL) is introduced. The proposed scheme is based on a weighted gradient aggregation using two-step optimization to offer a flexible training pipeline. Herein, two different flavors of the aggregation method are presented, leading to an order of magnitude improvement in convergence speed compared to other distributed or FL training algorithms like BMUF and FedAvg. Further, the aggregation algorithm acts as a regularizer of the… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 28 REFERENCES
Federated Transfer Learning with Dynamic Gradient Aggregation
TLDR
This is the first attempt to apply FL techniques to Speech Recognition tasks due to the inherent complexity and the proposed Federated Learning system is shown to outperform the golden standard of distributed training in both convergence speed and overall model performance. Expand
Federated Learning for Keyword Spotting
TLDR
An extensive empirical study of the federated averaging algorithm for the "Hey Snips" wake word based on a crowdsourced dataset that mimics a federation of wake word users shows that using an adaptive averaging strategy inspired from Adam highly reduces the number of communication rounds required to reach the target performance. Expand
Sequence-level self-learning with multi-task learning framework
TLDR
New self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR) using the multi-task learning (MTL) framework where then n-th best ASR hypothesis is used as the label of each task. Expand
A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection
  • Q. Li, Zeyi Wen, B. He
  • Computer Science, Mathematics
  • IEEE Transactions on Knowledge and Data Engineering
  • 2021
TLDR
A comprehensive review on federated learning systems is conducted and a thorough categorization is provided according to six different aspects, including data distribution, machine learning model, privacy mechanism, communication architecture, scale of federation and motivation of federation. Expand
Online Learning to Sample
TLDR
This work shows that SGD can be used to learn the best possible sampling distribution of an importance sampling estimator, and shows that the sampling Distribution of an SGD algorithm can be estimated online by incrementally minimizing the variance of the gradient. Expand
Online Deep Learning: Learning Deep Neural Networks on the Fly
TLDR
A new ODL framework is presented that attempts to tackle the challenges by learning DNN models which dynamically adapt depth from a sequence of training data in an online learning setting by proposing a novel Hedge Backpropagation method for online updating the parameters of DNN effectively. Expand
Federated Optimization: Distributed Optimization Beyond the Datacenter
We introduce a new and increasingly relevant setting for distributed optimization in machine learning, where the data defining the optimization are distributed (unevenly) over an extremely largeExpand
Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering
  • Kai Chen, Qiang Huo
  • Computer Science
  • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2016
TLDR
This work has trained successfully deep bidirectional long short-term memory (LSTM) recurrent neural networks and fully-connected feed-forward deep neural networks (DNNs) for large vocabulary continuous speech recognition on two benchmark tasks, namely 309-hour Switchboard-I and 1,860-hour "Switch-board+Fisher" task. Expand
Layer Trajectory BLSTM
TLDR
This study compares the performance between bidirectional LSTM (BLSTM) and LT (LT) models, and applies the layer trajectory idea to further improve BLSTM models, in which BLstM is in charge of modeling the temporal information while depth-LSTM takes care of senone classification. Expand
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
  • C. Chiu, T. Sainath, +11 authors M. Bacchiani
  • Computer Science, Engineering
  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
TLDR
A variety of structural and optimization improvements to the Listen, Attend, and Spell model are explored, which significantly improve performance and a multi-head attention architecture is introduced, which offers improvements over the commonly-used single- head attention. Expand
...
1
2
3
...