• Publications
  • Influence
Convolutional Neural Networks for Speech Recognition
TLDR
It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition
TLDR
The proposed CNN architecture is applied to speech recognition within the framework of hybrid NN-HMM model to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance.
Exploring convolutional neural network structures and optimization techniques for speech recognition
TLDR
This paper investigates several CNN architectures, including full and limited weight sharing, convolution along frequency and time axes, and stacking of several convolution layers, and develops a novel weighted softmax pooling layer so that the size in the pooled layer can be automatically learned.
Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code
TLDR
A new fast speaker adaptation method for the hybrid NN-HMM speech recognition model that can achieve over 10% relative reduction in phone error rate by using only seven utterances for adaptation.
Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition
TLDR
A general adaptation scheme for DNN based on discriminant condition codes is proposed, which is directly fed to various layers of a pre-trained DNN through a new set of connection weights, which are quite effective to adapt large DNN models using only a small amount of adaptation data.
Computer aided pronunciation learning system using speech recognition techniques
TLDR
This paper describes a speech-enabled Computer Aided Pronunciation Learning (CAPL) system HAFSS ©, developed for teaching Arabic pronunciations to non-native speakers, which correctly identified the error in 62.4% of pronunciation errors and made false acceptance of 14.9% of total errors.
A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion
We develop and present a novel deep convolutional neural network architecture, where heterogeneous pooling is used to provide constrained frequency-shift invariance in the speech spectrogram while
Deep segmental neural networks for speech recognition
TLDR
The deep segmental neural network (DSNN) is proposed, a segmental model that uses DNNs to estimate the acoustic scores of phonemic or sub-phonemic segments with variable lengths, which allows the DSNN to represent each segment as a single unit, in which frames are made dependent on each other.
Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition
TLDR
Experimental results on the TIMIT dataset demonstrates that both methods are quite effective in terms of adapting CNN based acoustic models and can achieve even better performance by combining these two methods together.
Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code
TLDR
This work has evaluated the proposed direct SC-based adaptation method in the large scale 320-hr Switchboard task and shown that the proposed method leads to up to 8% relative reduction in word error rate in Switchboard by using only a very small number of adaptation utterances per speaker.
...
...