• Publications
  • Influence
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Deep Neural Networks for Acoustic Modeling in Speech Recognition
This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Deep Learning: Methods and Applications
  • L. Deng, Dong Yu
  • Computer Science
    Found. Trends Signal Process.
  • 12 June 2014
This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.
Permutation invariant training of deep models for speaker-independent multi-talker speech separation
This work proposes a novel deep learning training criterion, named permutation invariant training (PIT), for speaker independent multi-talker speech separation, and finds that it compares favorably to non-negative matrix factorization (NMF), computational auditory scene analysis (CASA), and DPCL and generalizes well over unseen speakers and languages.
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs
This work shows empirically that in SGD training of deep neural networks, one can, at no or nearly no loss of accuracy, quantize the gradients aggressively—to but one bit per value—if the quantization error is carried forward across minibatches (error feedback), and implements data-parallel deterministically distributed SGD by combining this finding with AdaGrad.
Convolutional Neural Networks for Speech Recognition
It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
In this paper, we propose the utterance-level permutation invariant training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker independent
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription
This work investigates the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective to reduce the word error rate for speaker-independent transcription of phone calls.
Conversational Speech Transcription Using Context-Dependent Deep Neural Networks
Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and deep-belief-network