New types of deep neural network learning for speech recognition and related applications: an overview

@article{Deng2013NewTO,
  title={New types of deep neural network learning for speech recognition and related applications: an overview},
  author={Li Deng and Geoffrey E. Hinton and Brian Kingsbury},
  journal={2013 IEEE International Conference on Acoustics, Speech and Signal Processing},
  year={2013},
  pages={8599-8603}
}
In this paper, we provide an overview of the invited and contributed papers presented at the special session at ICASSP-2013, entitled “New Types of Deep Neural Network Learning for Speech Recognition and Related Applications,” as organized by the authors. We also describe the historical context in which acoustic models based on deep neural networks have been developed. The technical overview of the papers presented in our special session is organized into five ways of improving deep learning… 

Figures from this paper

Speech recognition using deep neural network - recent trends
TLDR
The current technology related to speech recognition and its slow adoption of DNN-based approaches are described and a historical note on the technology development for speech recognition system is given.
Deep learning: from speech recognition to language and multimodal processing
  • L. Deng
  • Computer Science
    APSIPA Transactions on Signal and Information Processing
  • 2016
TLDR
The historical path to this transformative success of deep learning in speech recognition is reflected, and a number of key issues in deep learning are discussed, and future directions are analyzed for perceptual tasks such as speech, image, and video, as well as for cognitive tasks involving natural language.
Convolutional Neural Networks for Speech Recognition
TLDR
It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Ensemble deep learning for speech recognition
Deep learning systems have dramatically improved the accuracy of speech recognition, and various deep architectures and learning methods have been developed with distinct strengths and weaknesses in
Deep learning in acoustic modeling for Automatic Speech Recognition and Understanding - an overview -
  • I. Gavat, D. Militaru
  • Computer Science
    2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)
  • 2015
TLDR
Specific algorithms like Restricted Bolzmann Machine (RBM), Convolutional Neural Network (CNN), Autoencoder (AE), Deep Belief Network (DBN), will be presented and evaluated, confirming the usefulness of the DL framework in ASRU.
Speech Recognition Using Deep Neural Networks: A Systematic Review
TLDR
A thorough examination of the different studies that have been conducted since 2006, when deep learning first arose as a new area of machine learning, for speech applications is provided.
Convolutional neural network vectors for speaker recognition
TLDR
The convVectors method was the most robust, improving the baseline system by an average of 43%, and recording an equal error rate of 1.05% EER, an important finding to understand how deep learning models can be adapted to the problem of speaker recognition.
Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals
TLDR
This paper presents a novel audio dataset of English spoken digits which is used for classification tasks on spoken digits and speaker's gender and confirms that the networks are highly reliant on features marked as relevant by LRP.
Persian speech recognition using deep learning
TLDR
For the first time, the combination of deep belief network (DBN), for extracting features of speech signals, and Deep Bidirectional Long Short-Term Memory (DBLSTM) with Connectionist Temporal Classification (CTC) output layer is used to create an AM on the Farsdat Persian speech data set.
Manifold regularized deep neural networks
TLDR
A manifold learning based regularization framework for DNN training is presented to preserve the underlying low dimensional manifold based relationships amongst speech feature vectors as part of the optimization procedure for estimating network parameters.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 58 REFERENCES
Exploring convolutional neural network structures and optimization techniques for speech recognition
TLDR
This paper investigates several CNN architectures, including full and limited weight sharing, convolution along frequency and time axes, and stacking of several convolution layers, and develops a novel weighted softmax pooling layer so that the size in the pooled layer can be automatically learned.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
TLDR
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Deep convolutional neural networks for LVCSR
TLDR
This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and explores the behavior of neural network features extracted from CNNs on a variety of LVCSS tasks, comparing CNNs toDNNs and GMMs.
Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
TLDR
This paper reports results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously, and outperforms the best Gaussian Mixture Model Hidden Markov Model baseline.
Improving deep neural networks for LVCSR using rectified linear units and dropout
TLDR
Modelling deep neural networks with rectified linear unit (ReLU) non-linearities with minimal human hyper-parameter tuning on a 50-hour English Broadcast News task shows an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improved over a strong GMM/HMM system.
Speech recognition with deep recurrent neural networks
TLDR
This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Deep Neural Networks for Acoustic Modeling in Speech Recognition
TLDR
This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion
We develop and present a novel deep convolutional neural network architecture, where heterogeneous pooling is used to provide constrained frequency-shift invariance in the speech spectrogram while
Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition
TLDR
The proposed CNN architecture is applied to speech recognition within the framework of hybrid NN-HMM model to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance.
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TLDR
A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
...
1
2
3
4
5
...