Very deep multilingual convolutional neural networks for LVCSR

@article{Sercu2015VeryDM,
  title={Very deep multilingual convolutional neural networks for LVCSR},
  author={Tom Sercu and Christian Puhrsch and Brian Kingsbury and Yann LeCun},
  journal={2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2015},
  pages={4955-4959}
}
Convolutional neural networks (CNNs) are a standard component of many current state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) systems. However, CNNs in LVCSR have not kept pace with recent advances in other domains where deeper neural networks provide superior performance. In this paper we propose a number of architectural advances in CNNs for LVCSR. First, we introduce a very deep convolutional network architecture with up to 14 weight layers. There are multiple… CONTINUE READING

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • We evaluate the improvements first on a Babel task for low resource speech recognition, obtaining an absolute 5.77% WER improvement over the baseline PLP DNN by training our CNN on the combined data of six different languages. We then evaluate the very deep CNNs on the Hub5'00 benchmark (using the 262 hours of SWB-1 training data) achieving a word error rate of 11.8% after cross-entropy training, a 1.4% WER improvement (10.6% relative) over the best published CNN result so far.
  • We showed an improvement of 2.50% WER over a standard DNN PLP baseline using 3 hours of data, and an improvement of 5.77% WER by combining six languages to train on 18 hours of data.

Citations

Publications citing this paper.
SHOWING 1-10 OF 123 CITATIONS

Using a Fine-Tuning Method for a Deep Authentication in Mobile Cloud Computing Based on Tensorflow Lite Framework

Abdelhakim Zeroual, Makhlouf Derdour, Mohamed Amroune, Atef Bentahar
  • 2019 International Conference on Networking and Advanced Systems (ICNAS)
  • 2019
VIEW 11 EXCERPTS
CITES METHODS
HIGHLY INFLUENCED

Adversarial Multilingual Training for Low-Resource Speech Recognition

  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
VIEW 6 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Recent progresses in deep learning based acoustic models

  • IEEE/CAA Journal of Automatica Sinica
  • 2017
VIEW 16 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Noise Robust Speech Recognition on Aurora4 by Humans and Machines

  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
VIEW 13 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Gated convolutional networks based hybrid acoustic models for low resource speech recognition

  • 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
  • 2017
VIEW 12 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Network architectures for multilingual speech representation learning

  • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
VIEW 4 EXCERPTS
CITES METHODS & BACKGROUND

Scalable algorithms for unsupervised clustering of acoustic data for speech recognition

  • Computer Speech & Language
  • 2017
VIEW 4 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Advances in Very Deep Convolutional Neural Networks for LVCSR

  • INTERSPEECH
  • 2016
VIEW 21 EXCERPTS
CITES BACKGROUND, METHODS & RESULTS

FILTER CITATIONS BY YEAR

2015
2019

CITATION STATISTICS

  • 21 Highly Influenced Citations

  • Averaged 30 Citations per year from 2017 through 2019

References

Publications referenced by this paper.
SHOWING 1-10 OF 31 REFERENCES

ADADELTA: An Adaptive Learning Rate Method

VIEW 7 EXCERPTS
HIGHLY INFLUENTIAL

Joint training of convolutional and non-convolutional neural networks

  • 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
VIEW 13 EXCERPTS
HIGHLY INFLUENTIAL

Deep convolutional neural networks for LVCSR

  • 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
VIEW 12 EXCERPTS

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks

  • 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
VIEW 1 EXCERPT