Corpus ID: 966801

Learning the speech front-end with raw waveform CLDNNs

@inproceedings{Sainath2015LearningTS,
  title={Learning the speech front-end with raw waveform CLDNNs},
  author={Tara N. Sainath and Ron J. Weiss and Andrew W. Senior and Kevin W. Wilson and Oriol Vinyals},
  booktitle={INTERSPEECH},
  year={2015}
}
  • Tara N. Sainath, Ron J. Weiss, +2 authors Oriol Vinyals
  • Published in INTERSPEECH 2015
  • Computer Science
  • Learning an acoustic model directly from the raw waveform has been an active area of research. [...] Key Method Specifically, we will show the benefit of the CLDNN, namely the time convolution layer in reducing temporal variations, the frequency convolution layer for preserving locality and reducing frequency variations, as well as the LSTM layers for temporal modeling. In addition, by stacking raw waveform features with log-mel features, we achieve a 3% relative reduction in word error rate.Expand Abstract

    Create an AI-powered research feed to stay up to date with new papers like this posted to ArXiv

    Citations

    Publications citing this paper.
    SHOWING 1-10 OF 269 CITATIONS

    Brain-like emergent auditory learning: A developmental method

    VIEW 10 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Convolutional gated recurrent neural network incorporating spatial features for audio tagging

    VIEW 8 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    Learning environmental sounds with end-to-end convolutional neural network

    • Yuji Tokozume, Tatsuya Harada
    • Computer Science
    • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2017
    VIEW 10 EXCERPTS
    CITES BACKGROUND, RESULTS & METHODS
    HIGHLY INFLUENCED

    Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition

    VIEW 7 EXCERPTS
    CITES BACKGROUND & METHODS

    Residual Convolutional CTC Networks for Automatic Speech Recognition

    VIEW 6 EXCERPTS
    CITES METHODS & BACKGROUND
    HIGHLY INFLUENCED

    Learning Multiscale Features Directly from Waveforms

    VIEW 4 EXCERPTS
    CITES BACKGROUND
    HIGHLY INFLUENCED

    Novel Unsupervised Auditory Filterbank Learning Using Convolutional RBM for Speech Recognition

    VIEW 4 EXCERPTS
    CITES RESULTS & METHODS
    HIGHLY INFLUENCED

    From Speech Recognition to Language and Multimodal Processing

    VIEW 9 EXCERPTS
    CITES BACKGROUND & METHODS
    HIGHLY INFLUENCED

    FILTER CITATIONS BY YEAR

    2015
    2020

    CITATION STATISTICS

    • 23 Highly Influenced Citations

    • Averaged 69 Citations per year from 2017 through 2019

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 20 REFERENCES

    Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks

    VIEW 4 EXCERPTS

    Speech acoustic modeling from raw multichannel waveforms

    VIEW 9 EXCERPTS

    Learning a better representation of speech soundwaves using restricted boltzmann machines

    VIEW 5 EXCERPTS
    HIGHLY INFLUENTIAL

    Convolu - tional , Long Short - Term Memory , Fully Connected Deep Neural Networks , ” in

    • A. Mohamed T. N. Sainath, B. Kingsbury
    • to appear in Proc . ICASSP
    • 2015

    Asynchronous stochastic optimization for sequence training of deep neural networks

    VIEW 1 EXCERPT

    Deep convolutional neural networks for LVCSR

    VIEW 1 EXCERPT

    Improvements to Deep Convolutional Neural Networks for LVCSR

    VIEW 1 EXCERPT