Learning the speech front-end with raw waveform CLDNNs

  title={Learning the speech front-end with raw waveform CLDNNs},
  author={Tara N. Sainath and Ron J. Weiss and Andrew W. Senior and Kevin W. Wilson and Oriol Vinyals},
Learning an acoustic model directly from the raw waveform has been an active area of research. However, waveformbased models have not yet matched the performance of logmel trained neural networks. We will show that raw waveform features match the performance of log-mel filterbank energies when used with a state-of-the-art CLDNN acoustic model trained on over 2,000 hours of speech. Specifically, we will show the benefit of the CLDNN, namely the time convolution layer in reducing temporal… CONTINUE READING
Highly Influential
This paper has highly influenced 14 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 193 citations. REVIEW CITATIONS
134 Citations
20 References
Similar Papers


Publications citing this paper.
Showing 1-10 of 134 extracted citations

194 Citations

Citations per Year
Semantic Scholar estimates that this publication has 194 citations based on the available data.

See our FAQ for additional information.


Publications referenced by this paper.
Showing 1-10 of 20 references

An efficient auditory filterbank based on the gammatone function , ” in

  • I. Nimmo-Smith, J. Holdsworth, P. Rice
  • a meeting of the IOC Speech Group on Auditory…
  • 2014

Similar Papers

Loading similar papers…