The IBM 2016 English Conversational Telephone Speech Recognition System

@article{Saon2016TheI2,
  title={The IBM 2016 English Conversational Telephone Speech Recognition System},
  author={George Saon and Tom Sercu and Steven J. Rennie and Hong-Kwang Jeff Kuo},
  journal={ArXiv},
  year={2016},
  volume={abs/1604.08242}
}
We describe a collection of acoustic and language modeling techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6.6% on the Switchboard subset of the Hub5 2000 evaluation testset. On the acoustic side, we use a score fusion of three strong models: recurrent nets with maxout activations, very deep convolutional nets with 3x3 kernels, and bidirectional long short-term memory nets which operate on FMLLR and i-vector features. On the language… CONTINUE READING

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • We describe a collection of acoustic and language modeling techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6.6% on the Switchboard subset of the Hub5 2000 evaluation testset.
  • None ing techniques that lowered the word error rate of our English conversational telephone LVCSR system to a record 6.6% on the Switchboard subset of the Hub5 2000 evaluation testset.

Citations

Publications citing this paper.
SHOWING 1-10 OF 66 CITATIONS

Articulatory Information and Multiview Features for Large Vocabulary Continuous Speech Recognition

  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
VIEW 6 EXCERPTS
CITES BACKGROUND, METHODS & RESULTS
HIGHLY INFLUENCED

End-to-end speech recognition and keyword search on low-resource languages

  • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
VIEW 13 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

English Conversational Telephone Speech Recognition by Humans and Machines

  • INTERSPEECH
  • 2017
VIEW 5 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Morph-to-word transduction for accurate and efficient automatic speech recognition and keyword search

  • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
VIEW 6 EXCERPTS
CITES METHODS
HIGHLY INFLUENCED

State of the art in Speech Recognition

VIEW 4 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Combining Deep Neural Network with SVM to Identify Used in IOT

  • 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC)
  • 2019
VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Forward-Backward Convolutional LSTM for Acoustic Modeling

VIEW 8 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Kernel Approximation Methods for Speech Recognition

  • J. Mach. Learn. Res.
  • 2017
VIEW 1 EXCERPT
HIGHLY INFLUENCED

Unfolded Deep Recurrent Convolutional Neural Network with Jump Ahead Connections for Acoustic Modeling

  • INTERSPEECH
  • 2017
VIEW 4 EXCERPTS
CITES METHODS, RESULTS & BACKGROUND
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2016
2019

CITATION STATISTICS

  • 10 Highly Influenced Citations

  • Averaged 20 Citations per year from 2017 through 2019

References

Publications referenced by this paper.
SHOWING 1-10 OF 28 REFERENCES

Very deep multilingual convolutional neural networks for LVCSR

  • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Continuous space language models

  • Computer Speech & Language
  • 2007
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Empirical study of neural network language models for Arabic speech recognition

  • 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
  • 2007
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

A Neural Syntactic Language Model

  • Machine Learning
  • 2005
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Deep bi-directional recurrent networks over spectral windows

  • 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
  • 2015
VIEW 2 EXCERPTS

Similar Papers