Recent advances in deep learning for speech research at Microsoft

@article{Deng2013RecentAI,
  title={Recent advances in deep learning for speech research at Microsoft},
  author={Li Deng and Jinyu Li and Jui Ting Huang and Kaisheng Yao and Dong Yu and Frank Seide and Michael L. Seltzer and Geoffrey Zweig and Xiaodong He and J. Williams and Yifan Gong and Alex Acero},
  journal={2013 IEEE International Conference on Acoustics, Speech and Signal Processing},
  year={2013},
  pages={8604-8608}
}
  • L. Deng, Jinyu Li, +9 authors A. Acero
  • Published 2013
  • Computer Science
  • 2013 IEEE International Conference on Acoustics, Speech and Signal Processing
Deep learning is becoming a mainstream technology for speech recognition at industrial scale. [...] Key Result Potential improvement of these techniques and future research directions are discussed.Expand
Deep learning in acoustic modeling for Automatic Speech Recognition and Understanding - an overview -
  • I. Gavat, D. Militaru
  • Computer Science
  • 2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)
  • 2015
TLDR
Specific algorithms like Restricted Bolzmann Machine (RBM), Convolutional Neural Network (CNN), Autoencoder (AE), Deep Belief Network (DBN), will be presented and evaluated, confirming the usefulness of the DL framework in ASRU. Expand
Deep learning: from speech recognition to language and multimodal processing
  • L. Deng
  • Computer Science
  • APSIPA Transactions on Signal and Information Processing
  • 2016
TLDR
The historical path to this transformative success of deep learning in speech recognition is reflected, and a number of key issues in deep learning are discussed, and future directions are analyzed for perceptual tasks such as speech, image, and video, as well as for cognitive tasks involving natural language. Expand
Deep learning of split temporal context for automatic speech recognition
TLDR
This work proposes an alternative solution that splits the temporal context into blocks, each learned with a separate deep model, and demonstrates that this approach significantly reduces the number of parameters compared to the classical deep learning procedure, and obtains better results on the TIMIT dataset. Expand
Speech Recognition Using Deep Neural Networks: A Systematic Review
TLDR
A thorough examination of the different studies that have been conducted since 2006, when deep learning first arose as a new area of machine learning, for speech applications is provided. Expand
Ensemble deep learning for speech recognition
Deep learning systems have dramatically improved the accuracy of speech recognition, and various deep architectures and learning methods have been developed with distinct strengths and weaknesses inExpand
A Survey of Deep Learning Techniques in Speech Recognition
TLDR
A survey is provided on the application of three deep learning architectures in the field of speech recognition, namely, Deep Belief Networks, Convolutional Neural Networks and Recurrent Neural Networks. Expand
Improving speech recognition using data augmentation and acoustic model fusion
TLDR
This work proposes a new Deep Neural Network (DNN) speech recognition architecture which takes advantage from both DA and EM approaches in order to improve the prediction accuracy of the system. Expand
New types of deep neural network learning for speech recognition and related applications: an overview
TLDR
An overview of the invited and contributed papers presented at the special session at ICASSP-2013, entitled “New Types of Deep Neural Network Learning for Speech Recognition and Related Applications,” as organized by the authors is provided. Expand
Speech recognition using deep neural network - recent trends
Deep neural networks (DNN) are special forms of learning-based structures composed of multiple hidden layers formed by artificial neurons. These are different to the conventional artificial neuralExpand
Ensemble Learning Approaches in Speech Recognition
TLDR
Ensemble learning for speech recognition has been largely fruitful, and it is expected to continue progress along with the advances in machine learning, speech and language modeling, as well as computing technology. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 68 REFERENCES
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
TLDR
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition. Expand
Investigation of full-sequence training of deep belief networks for speech recognition
TLDR
It is shown that the DBNs learned using the sequence-based training criterion outperform those with frame-based criterion using both threelayer and six-layer models, but the optimization procedure for the deeper DBN is more difficult for the former criterion. Expand
Machine Learning Paradigms for Speech Recognition: An Overview
  • L. Deng, Xiao Li
  • Computer Science
  • IEEE Transactions on Audio, Speech, and Language Processing
  • 2013
TLDR
This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems, and presents and analyzes recent developments of deep learning and learning with sparse representations. Expand
Towards deeper understanding: Deep convex networks for semantic utterance classification
TLDR
The DCN-based method produces higher SUC accuracy than the Boosting-based discriminative classifier with word trigrams, and experimental results obtained on a domain classification task for spoken language understanding demonstrate the effectiveness of DCNs. Expand
An investigation of deep neural networks for noise robust speech recognition
TLDR
The noise robustness of DNN-based acoustic models can match state-of-the-art performance on the Aurora 4 task without any explicit noise compensation and can be further improved by incorporating information about the environment into DNN training using a new method called noise-aware training. Expand
Use of kernel deep convex networks and end-to-end learning for spoken language understanding
TLDR
Experimental results demonstrating dramatic error reduction achieved by the K-DCN over both the Boosting-based baseline and the DCN on a domain classification task of SLU, especially when a highly correlated set of features extracted from search query click logs are used. Expand
Deep Neural Networks for Acoustic Modeling in Speech Recognition
TLDR
This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition. Expand
Multilingual acoustic models using distributed deep neural networks
TLDR
Experimental results for cross- and multi-lingual network training of eleven Romance languages on 10k hours of data in total show average relative gains over the monolingual baselines, but additional gain from jointly training the languages on all data comes at an increased training time of roughly four weeks. Expand
Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition
TLDR
It is shown that pre-training can initialize weights to a point in the space where fine-tuning can be effective and thus is crucial in training deep structured models and in the recognition performance of a CD-DBN-HMM based large-vocabulary speech recognizer. Expand
Adaptation of context-dependent deep neural networks for automatic speech recognition
TLDR
On a large vocabulary speech recognition task, a stochastic gradient ascent implementation of the fDLR and the top hidden layer adaptation is shown to reduce word error rates (WERs) by 17% and 14%, respectively, compared to the baseline DNN performances. Expand
...
1
2
3
4
5
...