Corpus ID: 15641618

Ensemble deep learning for speech recognition

@inproceedings{Deng2014EnsembleDL,
  title={Ensemble deep learning for speech recognition},
  author={Li Deng and John C. Platt},
  booktitle={INTERSPEECH},
  year={2014}
}
Deep learning systems have dramatically improved the accuracy of speech recognition, and various deep architectures and learning methods have been developed with distinct strengths and weaknesses in recent years. [...] Key Method Convex optimization problems are formulated and solved, with analytical formulas derived for training the ensemble-learning parameters. Experimental results demonstrate a significant increase in phone recognition accuracy after stacking the deep learning subsystems that use different…Expand
Speech recognition using deep neural network - recent trends
Deep neural networks (DNN) are special forms of learning-based structures composed of multiple hidden layers formed by artificial neurons. These are different to the conventional artificial neuralExpand
Multilingual Convolutional, Long Short-Term Memory, Deep Neural Networks for Low Resource Speech Recognition
TLDR
This paper uses CNNs and DNNs for multilingual speech recognition, for the prediction and correction (PAC) architecture, in order to calculate the state probability, and proposes a proposed model known as PAC-MCLDNN. Expand
Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition
TLDR
This paper describes how to use knowledge distillation to combine acoustic models in a way that improves recognition accuracy significantly, can be implemented with standard training tools, and requires no additional complexity during recognition. Expand
Speech Recognition Using Deep Neural Networks: A Systematic Review
TLDR
A thorough examination of the different studies that have been conducted since 2006, when deep learning first arose as a new area of machine learning, for speech applications is provided. Expand
A Study on the Performance Evaluation of Machine Learning Models for Phoneme Classification
TLDR
This paper provides a comparative performance analysis of both shallow and deep machine learning classifiers for speech recognition task using frame-level phoneme classification using DNN and LSTM networks. Expand
Multi-Modal Hybrid Deep Neural Network for Speech Enhancement
TLDR
This paper proposes a novel deep learning model inspired by insights from human audio visual perception, and compares the quality of enhanced speech from the hybrid models with those from traditional DNN and BiLSTM models. Expand
Towards Robust Combined Deep Architecture for Speech Recognition : Experiments on TIMIT
TLDR
This paper proposes to combine CNN, GRU-RNN and DNN in a single deep architecture called Convolutional Gated Recurrent Unit, Deep Neural Network (CGDNN). Expand
Review on Speech Recognition with Deep Learning Methods
The most common mode of communication between humans is speech. As this is the most preferred way, humans would like to use speech to interact with machines also. That is why, speech recognition hasExpand
Sound Source Localization Using Deep Learning Models
TLDR
This study shows that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information. Expand
On the importance of modeling and robustness for deep neural network feature
  • Shuo-yiin Chang, S. Wegmann
  • Computer Science
  • 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
TLDR
Diagnostic analysis shows that a DNN-based feature representation that uses MFCC inputs (MFCC-DNN) is indeed superior to the corresponding MFCC baselines in the two matched scenarios where the source of recognition errors are from incorrect model, but the Dnn-based features and MFCCs have nearly identical and poor performance in the mismatched scenario. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 45 REFERENCES
New types of deep neural network learning for speech recognition and related applications: an overview
TLDR
An overview of the invited and contributed papers presented at the special session at ICASSP-2013, entitled “New Types of Deep Neural Network Learning for Speech Recognition and Related Applications,” as organized by the authors is provided. Expand
Deep Neural Networks for Acoustic Modeling in Speech Recognition
TLDR
This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition. Expand
Investigation of full-sequence training of deep belief networks for speech recognition
TLDR
It is shown that the DBNs learned using the sequence-based training criterion outperform those with frame-based criterion using both threelayer and six-layer models, but the optimization procedure for the deeper DBN is more difficult for the former criterion. Expand
Recurrent Neural Networks for Noise Reduction in Robust ASR
TLDR
This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly. Expand
Speech recognition with deep recurrent neural networks
TLDR
This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. Expand
Hybrid speech recognition with Deep Bidirectional LSTM
TLDR
The hybrid approach with DBLSTM appears to be well suited for tasks where acoustic modelling predominates, and the improvement in word error rate over the deep network is modest, despite a great increase in framelevel accuracy. Expand
A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion
We develop and present a novel deep convolutional neural network architecture, where heterogeneous pooling is used to provide constrained frequency-shift invariance in the speech spectrogram whileExpand
Sequence classification using the high-level features extracted from deep neural networks
  • L. Deng, Jianshu Chen
  • Computer Science
  • 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
TLDR
This work reports on the construction of a diverse set of DNN features, including the vectors extracted from the output layer and from various hidden layers in the DNN, and applies these features as the inputs to four types of classifiers to carry out the identical sequence classification task of phone recognition. Expand
Scalable stacking and learning for building deep architectures
  • L. Deng, Dong Yu, John C. Platt
  • Computer Science
  • 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
TLDR
The Deep Stacking Network (DSN) is presented, which overcomes the problem of parallelizing learning algorithms for deep architectures and provides a method of stacking simple processing modules in buiding deep architectures, with a convex learning problem in each module. Expand
Making Deep Belief Networks effective for large vocabulary continuous speech recognition
TLDR
This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task. Expand
...
1
2
3
4
5
...