Share This Author
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs
This work shows empirically that in SGD training of deep neural networks, one can, at no or nearly no loss of accuracy, quantize the gradients aggressively—to but one bit per value—if the quantization error is carried forward across minibatches (error feedback), and implements data-parallel deterministically distributed SGD by combining this finding with AdaGrad.
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription
- F. Seide, Gang Li, Xie Chen, Dong Yu
- Computer ScienceIEEE Workshop on Automatic Speech Recognition…
- 1 December 2011
This work investigates the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective to reduce the word error rate for speaker-independent transcription of phone calls.
Conversational Speech Transcription Using Context-Dependent Deep Neural Networks
Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and deep-belief-network…
An introduction to computational networks and the computational network toolkit (invited talk)
The computational network toolkit (CNTK), an implementation of CN that supports both GPU and CPU, is introduced and the architecture and the key components of the CNTK are described, the command line options to use C NTK, and the network definition and model editing language are described.
Marian: Fast Neural Machine Translation in C++
Marian is an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs that can achieve high training and translation speed.
CNTK: Microsoft's Open-Source Deep-Learning Toolkit
This tutorial will introduce the Computational Network Toolkit, or CNTK, Microsoft's cutting-edge open-source deep-learning toolkit for Windows and Linux, and show how typical uses looks like for relevant tasks like image recognition, sequence-to-sequence modeling, and speech recognition.
KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
- Dong Yu, K. Yao, Hang Su, Gang Li, F. Seide
- Computer ScienceIEEE International Conference on Acoustics…
- 26 May 2013
Experiments demonstrate that the proposed adaptation technique can provide 2%-30% relative error reduction against the already very strong speaker independent CD-DNN-HMM systems using different adaptation sets under both supervised and unsupervised adaptation setups.
Achieving Human Parity on Automatic Chinese to English News Translation
It is found that Microsoft's latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations.
Recent advances in deep learning for speech research at Microsoft
An overview of the work by Microsoft speech researchers since 2009 is provided, focusing on more recent advances which shed light to the basic capabilities and limitations of the current deep learning technology.
Achieving Human Parity in Conversational Speech Recognition
The human error rate on the widely used NIST 2000 test set is measured, and the latest automated speech recognition system has reached human parity, establishing a new state of the art, and edges past the human benchmark.