A Random Gossip BMUF Process for Neural Language Modeling

@article{Huang2020ARG,
  title={A Random Gossip BMUF Process for Neural Language Modeling},
  author={Yiheng Huang and Jinchuan Tian and Lei Han and Guangsen Wang and Xingcheng Song and Dan Su and Dong Yu},
  journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2020},
  pages={7959-7963}
}
  • Yiheng Huang, Jinchuan Tian, +4 authors Dong Yu
  • Published 2020
  • Computer Science
  • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
LSTM language model is an essential component of industrial ASR systems. [...] Key Method We apply this method to several LSTM language modeling tasks. Experimental results show that our approach achieves consistently better performance than the conventional BMUF. In particular, we obtain a lower perplexity than the single-GPU baseline on the wiki-text-103 benchmark using 4 GPUs. In addition, no performance degradation is incurred when scaling to 8 and 16 GPUs. Last but not least, our approach has a much…Expand
Federated Acoustic Modeling For Automatic Speech Recognition

References

SHOWING 1-10 OF 30 REFERENCES
Exploring the Limits of Language Modeling
Parallel training of DNNs with Natural Gradient and Parameter Averaging
Attention is All you Need
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
Distributed learning of multilingual DNN feature extractors using GPUs
Pointer Sentinel Mixture Models
How to scale distributed deep learning?
Large Scale Distributed Deep Networks
...
1
2
3
...