Learning the speech front-end with raw waveform CLDNNs

@inproceedings{Sainath2015LearningTS,
  title={Learning the speech front-end with raw waveform CLDNNs},
  author={Tara N. Sainath and Ron J. Weiss and Andrew W. Senior and Kevin W. Wilson and Oriol Vinyals},
  booktitle={INTERSPEECH},
  year={2015}
}
Learning an acoustic model directly from the raw waveform has been an active area of research. However, waveformbased models have not yet matched the performance of logmel trained neural networks. We will show that raw waveform features match the performance of log-mel filterbank energies when used with a state-of-the-art CLDNN acoustic model trained on over 2,000 hours of speech. Specifically, we will show the benefit of the CLDNN, namely the time convolution layer in reducing temporal… CONTINUE READING
Highly Influential
This paper has highly influenced 15 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 214 citations. REVIEW CITATIONS

From This Paper

Figures, tables, results, and topics from this paper.

Key Quantitative Results

  • In addition, by stacking raw waveform features with log-mel features, we achieve a 3% relative reduction in word error rate.

Citations

Publications citing this paper.
Showing 1-10 of 152 extracted citations

Convolutional gated recurrent neural network incorporating spatial features for audio tagging

2017 International Joint Conference on Neural Networks (IJCNN) • 2017
View 8 Excerpts
Highly Influenced

Learning environmental sounds with end-to-end convolutional neural network

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2017
View 10 Excerpts
Highly Influenced

Novel Unsupervised Auditory Filterbank Learning Using Convolutional RBM for Speech Recognition

IEEE/ACM Transactions on Audio, Speech, and Language Processing • 2016
View 4 Excerpts
Highly Influenced

Towards End-to-End Speech Recognition

View 4 Excerpts
Highly Influenced

Delayed Skip Connections for Music Content Driven Motion Generation

nelson. yalta
2018
View 5 Excerpts
Highly Influenced

215 Citations

05020152016201720182019
Citations per Year
Semantic Scholar estimates that this publication has 215 citations based on the available data.

See our FAQ for additional information.

References

Publications referenced by this paper.
Showing 1-10 of 20 references

Learning a better representation of speech soundwaves using restricted boltzmann machines

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2011
View 5 Excerpts
Highly Influenced

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2015
View 4 Excerpts

Speech acoustic modeling from raw multichannel waveforms

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2015
View 9 Excerpts

An efficient auditory filterbank based on the gammatone function , ” in

I. Nimmo-Smith, J. Holdsworth, P. Rice
a meeting of the IOC Speech Group on Auditory Modelling at RSRE • 2014

Asynchronous stochastic optimization for sequence training of deep neural networks

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) • 2014
View 1 Excerpt

Deep convolutional neural networks for LVCSR

2013 IEEE International Conference on Acoustics, Speech and Signal Processing • 2013
View 1 Excerpt

Improvements to Deep Convolutional Neural Networks for LVCSR

2013 IEEE Workshop on Automatic Speech Recognition and Understanding • 2013
View 1 Excerpt

Similar Papers

Loading similar papers…