Learning the speech front-end with raw waveform CLDNNs

@inproceedings{Sainath2015LearningTS,
  title={Learning the speech front-end with raw waveform CLDNNs},
  author={Tara N. Sainath and Ron J. Weiss and Andrew W. Senior and Kevin W. Wilson and Oriol Vinyals},
  booktitle={INTERSPEECH},
  year={2015}
}
Learning an acoustic model directly from the raw waveform has been an active area of research. However, waveformbased models have not yet matched the performance of logmel trained neural networks. We will show that raw waveform features match the performance of log-mel filterbank energies when used with a state-of-the-art CLDNN acoustic model trained on over 2,000 hours of speech. Specifically, we will show the benefit of the CLDNN, namely the time convolution layer in reducing temporal… CONTINUE READING

Similar Papers

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • In addition, by stacking raw waveform features with log-mel features, we achieve a 3% relative reduction in word error rate.

Citations

Publications citing this paper.
SHOWING 1-10 OF 196 CITATIONS, ESTIMATED 90% COVERAGE

Convolutional gated recurrent neural network incorporating spatial features for audio tagging

  • 2017 International Joint Conference on Neural Networks (IJCNN)
  • 2017
VIEW 8 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Learning environmental sounds with end-to-end convolutional neural network

  • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2017
VIEW 10 EXCERPTS
CITES BACKGROUND, RESULTS & METHODS
HIGHLY INFLUENCED

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition

  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2017
VIEW 8 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Learning Multiscale Features Directly FromWaveforms

VIEW 4 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Novel Unsupervised Auditory Filterbank Learning Using Convolutional RBM for Speech Recognition

  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2016
VIEW 4 EXCERPTS
CITES RESULTS & METHODS
HIGHLY INFLUENCED

Multi-Span Acoustic Modelling using Raw Waveform Signals

  • ArXiv
  • 2019
VIEW 4 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2015
2019

CITATION STATISTICS

  • 26 Highly Influenced Citations

  • Averaged 49 Citations per year over the last 3 years

References

Publications referenced by this paper.
SHOWING 1-10 OF 20 REFERENCES

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks

  • 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL

Speech acoustic modeling from raw multichannel waveforms

  • 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
VIEW 9 EXCERPTS
HIGHLY INFLUENTIAL

Learning a better representation of speech soundwaves using restricted boltzmann machines

  • 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2011
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Convolu - tional , Long Short - Term Memory , Fully Connected Deep Neural Networks , ” in

A. Mohamed T. N. Sainath, B. Kingsbury
  • to appear in Proc . ICASSP
  • 2015

Asynchronous stochastic optimization for sequence training of deep neural networks

  • 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
VIEW 1 EXCERPT

Deep convolutional neural networks for LVCSR

  • 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
VIEW 1 EXCERPT

Improvements to Deep Convolutional Neural Networks for LVCSR

  • 2013 IEEE Workshop on Automatic Speech Recognition and Understanding
  • 2013
VIEW 1 EXCERPT