• Corpus ID: 15641618

Ensemble deep learning for speech recognition

  title={Ensemble deep learning for speech recognition},
  author={Li Deng and John C. Platt},
Deep learning systems have dramatically improved the accuracy of speech recognition, and various deep architectures and learning methods have been developed with distinct strengths and weaknesses in recent years. [] Key Method Convex optimization problems are formulated and solved, with analytical formulas derived for training the ensemble-learning parameters. Experimental results demonstrate a significant increase in phone recognition accuracy after stacking the deep learning subsystems that use different…

Figures and Tables from this paper

Advanced Convolutional Neural Network-Based Hybrid Acoustic Models for Low-Resource Speech Recognition

The results of contributions to combine CNN and conventional RNN with gate, highway, and residual networks to reduce the above problems are presented and the optimal neural network structures and training strategies for the proposed neural network models are explored.

Speech recognition using deep neural network - recent trends

The current technology related to speech recognition and its slow adoption of DNN-based approaches are described and a historical note on the technology development for speech recognition system is given.

Distilling Knowledge from Ensembles of Neural Networks for Speech Recognition

This paper describes how to use knowledge distillation to combine acoustic models in a way that improves recognition accuracy significantly, can be implemented with standard training tools, and requires no additional complexity during recognition.

A Study on the Performance Evaluation of Machine Learning Models for Phoneme Classification

This paper provides a comparative performance analysis of both shallow and deep machine learning classifiers for speech recognition task using frame-level phoneme classification using DNN and LSTM networks.

Multi-Modal Hybrid Deep Neural Network for Speech Enhancement

This paper proposes a novel deep learning model inspired by insights from human audio visual perception, and compares the quality of enhanced speech from the hybrid models with those from traditional DNN and BiLSTM models.

Towards Robust Combined Deep Architecture for Speech Recognition : Experiments on TIMIT

This paper proposes to combine CNN, GRU-RNN and DNN in a single deep architecture called Convolutional Gated Recurrent Unit, Deep Neural Network (CGDNN).

Review on Speech Recognition with Deep Learning Methods

A three stage neural integrated model speech signal enhancement and use the decomposition integrated HMM model for speech feature transformation and experimental results show that system is able to recognize words at sufficiently high accuracy.

Sound Source Localization Using Deep Learning Models

This study shows that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.

On the importance of modeling and robustness for deep neural network feature

  • Shuo-yiin ChangS. Wegmann
  • Computer Science
    2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
Diagnostic analysis shows that a DNN-based feature representation that uses MFCC inputs (MFCC-DNN) is indeed superior to the corresponding MFCC baselines in the two matched scenarios where the source of recognition errors are from incorrect model, but the Dnn-based features and MFCCs have nearly identical and poor performance in the mismatched scenario.



New types of deep neural network learning for speech recognition and related applications: an overview

An overview of the invited and contributed papers presented at the special session at ICASSP-2013, entitled “New Types of Deep Neural Network Learning for Speech Recognition and Related Applications,” as organized by the authors is provided.

Deep Neural Networks for Acoustic Modeling in Speech Recognition

This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.

Investigation of full-sequence training of deep belief networks for speech recognition

It is shown that the DBNs learned using the sequence-based training criterion outperform those with frame-based criterion using both threelayer and six-layer models, but the optimization procedure for the deeper DBN is more difficult for the former criterion.

Recurrent Neural Networks for Noise Reduction in Robust ASR

This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

Speech recognition with deep recurrent neural networks

This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.

Hybrid speech recognition with Deep Bidirectional LSTM

The hybrid approach with DBLSTM appears to be well suited for tasks where acoustic modelling predominates, and the improvement in word error rate over the deep network is modest, despite a great increase in framelevel accuracy.

Sequence classification using the high-level features extracted from deep neural networks

  • L. DengJianshu Chen
  • Computer Science
    2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
This work reports on the construction of a diverse set of DNN features, including the vectors extracted from the output layer and from various hidden layers in the DNN, and applies these features as the inputs to four types of classifiers to carry out the identical sequence classification task of phone recognition.

Scalable stacking and learning for building deep architectures

  • L. DengDong YuJohn C. Platt
  • Computer Science
    2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
The Deep Stacking Network (DSN) is presented, which overcomes the problem of parallelizing learning algorithms for deep architectures and provides a method of stacking simple processing modules in buiding deep architectures, with a convex learning problem in each module.

Making Deep Belief Networks effective for large vocabulary continuous speech recognition

This paper explores the performance of DBNs in a state-of-the-art LVCSR system, showing improvements over Multi-Layer Perceptrons (MLPs) and GMM/HMMs across a variety of features on an English Broadcast News task.

Recent advances in deep learning for speech research at Microsoft

  • L. DengJinyu Li A. Acero
  • Computer Science
    2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
An overview of the work by Microsoft speech researchers since 2009 is provided, focusing on more recent advances which shed light to the basic capabilities and limitations of the current deep learning technology.