Deep bi-directional recurrent networks over spectral windows

@article{Mohamed2015DeepBR,
  title={Deep bi-directional recurrent networks over spectral windows},
  author={Abdel-rahman Mohamed and Frank Seide and Dong Yu and Jasha Droppo and Andreas Stolcke and Geoffrey Zweig and Gerald Penn},
  journal={2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)},
  year={2015},
  pages={78-83}
}
Long short-term memory (LSTM) acoustic models have recently achieved state-of-the-art results on speech recognition tasks. As a type of recurrent neural network, LSTMs potentially have the ability to model long-span phenomena relating the spectral input to linguistic units. However, it has not been clear whether their observed performance is actually due to this capability, or instead if it is due to a better modeling of short term dynamics through the recurrence. In this paper. we answer this… CONTINUE READING

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • On the SWBD/Fisher corpus, applying bidirectional LSTM RNNs to spectral windows of about 0.5s improves WER on the Hub5'00 benchmark set by 16% relative compared to our best sequence-trained DNN. On an extended 3850h training set that that also includes lectures, the relative gain becomes 28% (Hub5'00 WER 9.2%).
  • On the SWBD/Fisher corpus, applying bidirectional LSTM RNNs to spectral windows of about 0.5s improves WER on the Hub5’00 benchmark set by 16% relative compared to our best sequence-trained DNN. On an extended 3850h training set that that also includes lectures, the relative gain becomes 28% (Hub5’00 WER 9.2%).

Citations

Publications citing this paper.
SHOWING 1-10 OF 28 CITATIONS

Combining Speech and Speaker Recognition - A Joint Modeling Approach

VIEW 4 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models

VIEW 5 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Detecting Institutional Dialog Acts in Police Traffic Stops

  • Transactions of the Association for Computational Linguistics
  • 2018
VIEW 3 EXCERPTS
CITES METHODS & RESULTS
HIGHLY INFLUENCED

References

Publications referenced by this paper.
SHOWING 1-10 OF 22 REFERENCES

Speech recognition with deep recurrent neural networks

  • 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
VIEW 4 EXCERPTS

Convolutional Neural Networks for Speech Recognition

  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2014
VIEW 1 EXCERPT

On parallelizability of stochastic gradient descent for speech DNNS

  • 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
VIEW 1 EXCERPT

Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription

  • 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
VIEW 2 EXCERPTS

Hybrid speech recognition with Deep Bidirectional LSTM

  • 2013 IEEE Workshop on Automatic Speech Recognition and Understanding
  • 2013
VIEW 1 EXCERPT