The IBM 2015 English Conversational Telephone Speech Recognition System

@article{Saon2015TheI2,
  title={The IBM 2015 English Conversational Telephone Speech Recognition System},
  author={George Saon and Hong-Kwang Jeff Kuo and Steven J. Rennie and Michael Picheny},
  journal={ArXiv},
  year={2015},
  volume={abs/1505.05899}
}
We describe the latest improvements to the IBM English conversational telephone speech recognition system. Some of the techniques that were found beneficial are: maxout networks with annealed dropout rates; networks with a very large number of outputs trained on 2000 hours of data; joint modeling of partially unfolded recurrent neural networks and convolutional nets by combining the bottleneck and output layers and retraining the resulting model; and lastly, sophisticated language model… CONTINUE READING

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • These techniques result in an 8.0% word error rate on the Switchboard part of the Hub5-2000 evaluation test set which is 23% relative better than our previous best published result.

Citations

Publications citing this paper.
SHOWING 1-10 OF 72 CITATIONS

Adaptive Very Deep Convolutional Residual Network for Noise Robust Speech Recognition

  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2018
VIEW 11 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Advances in Very Deep Convolutional Neural Networks for LVCSR

  • INTERSPEECH
  • 2016
VIEW 4 EXCERPTS
CITES METHODS
HIGHLY INFLUENCED

Very deep multilingual convolutional neural networks for LVCSR

  • 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
VIEW 12 EXCERPTS
CITES BACKGROUND, RESULTS & METHODS
HIGHLY INFLUENCED

One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-Off in Machine Learning Cloud Service APIs via Tolerance Tiers

  • 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
  • 2019
VIEW 6 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Multilingual techniques for low resource automatic speech recognition

VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Training variance and performance evaluation of neural networks in speech

  • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2016
VIEW 3 EXCERPTS
CITES BACKGROUND & METHODS

Accelerating Conversational Agents Built With Off-the-Shelf Modularized Services

Jinho Lee, Inseok Hwang, Thomas S. Hubregtsen, Anne E. Gattiker, Christopher M. Durham
  • IEEE Pervasive Computing
  • 2019
VIEW 1 EXCERPT
CITES METHODS

FILTER CITATIONS BY YEAR

2015
2019

CITATION STATISTICS

  • 8 Highly Influenced Citations

  • Averaged 13 Citations per year from 2017 through 2019

References

Publications referenced by this paper.
SHOWING 1-10 OF 30 REFERENCES

Performance of the IBM LVCSR system on the Switchboard corpus

F. Liu, M. Monkowski, +3 authors P. Rao
  • Proceedings of Speech Research Symposium, 1995, p. 189. 3143
  • 1995
VIEW 13 EXCERPTS
HIGHLY INFLUENTIAL

Joint training of convolutional and non-convolutional neural networks

  • 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
VIEW 8 EXCERPTS

Continuous space language models

  • Computer Speech & Language
  • 2007
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Empirical study of neural network language models for Arabic speech recognition

  • 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
  • 2007
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

A Neural Syntactic Language Model

  • Machine Learning
  • 2005
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Annealed dropout training of deep networks

  • 2014 IEEE Spoken Language Technology Workshop (SLT)
  • 2014
VIEW 1 EXCERPT