• Corpus ID: 65239399

Deep Neural Network Acoustic Models for ASR

@inproceedings{Mohamed2014DeepNN,
  title={Deep Neural Network Acoustic Models for ASR},
  author={Abdel-rahman Mohamed},
  year={2014}
}
Deep Neural Network acoustic models for ASR Abdel-rahman Mohamed Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2014 Automatic speech recognition (ASR) is a key core technology for the information age. ASR systems have evolved from discriminating among isolated digits to recognizing telephone-quality, spontaneous speech, allowing for a growing number of practical applications in various sectors. Nevertheless, there are still serious challenges facing ASR… 
Towards Robust Combined Deep Architecture for Speech Recognition : Experiments on TIMIT
TLDR
This paper proposes to combine CNN, GRU-RNN and DNN in a single deep architecture called Convolutional Gated Recurrent Unit, Deep Neural Network (CGDNN).
Robust End to End Acoustic Model Based on Deep Similarity Network
TLDR
A new robust speech recognition model is proposed to tackle performance degradation in the presence of acoustic interference, and the idea of parameter sharing between clean speeches and noisy ones is suggested to improve the generalization capability of the model.
Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages
TLDR
This paper presents a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems for a subset of six Slavic languages across two domains and examines two low-level signal descriptors (spectral and cepstral features) for this task.
Improve Data Utilization with Two-stage Learning in CNN-LSTM-based Voice Activity Detection
TLDR
This work proposes a two-stage training strategy that achieves over 2.89% relative improvement than the original CLDNN on noise matched condition and over 1.07% on unmatched condition and shows that the method has obvious advantages in discriminative ability and generalization ability than compared approaches in different scale of training data, especially in small datasets.
A Comparative Study of Features for Acoustic Cough Detection Using Deep Architectures*
  • Igor Miranda, A. Diacon, T. Niesler
  • Computer Science
    2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
  • 2019
TLDR
Although MFCC performance is improved by sinusoidal liftering, STFT and MFB lead to better results, an improvement exceeding 7% in the area under the receiver operating characteristic curve across all classifiers is achieved.
Hybrid context dependent CD-DNN-HMM keywords spotting on continuous speech
  • Hinda Dridi, K. Ouni
  • Computer Science, Economics
    2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)
  • 2017
TLDR
A systematic procedure to implement two-stage based keywords spotting system (KWS) using a CD-DNN-HMM model built with the Kaldi toolkit and the classification and regression tree (CART) implemented with the software MATLAB.
Acoustic scene classification using auditory datasets
TLDR
The project conducted to classify some pre-defined acoustic scene is discussed and explained, and improvised data analysis and data augmentation for audio datasets like frequency masking and random frequency-time stretching are used and explained.
Deep learning for spoken language identification
TLDR
Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fiDeep learning for spoken language identification School School of Science Master’s programme Computer, Communication and Information Sciences Major Computer Science Code SCI3042.
Speech-Based CALL System to Evaluate the Meaning and Grammar Errors in English Spoken Utterance
TLDR
The universal sentence encoder was used to encode each sentence into 512-dimensional vector to represent the semantic features of the response, and a binary embedding approach to produce 438 binary features vector from the response.
A depthwise separable convolutional neural network for keyword spotting on an embedded system
TLDR
A keyword spotting algorithm implemented on an embedded system using a depthwise separable convolutional neural network classifier is reported, finding that quantization of pre-trained networks using mixed and dynamic fixed point principles could reduce the memory footprint and computational requirements without lowering classification accuracy.
...
...

References

SHOWING 1-10 OF 98 REFERENCES
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
TLDR
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Speech Recognition Using Augmented Conditional Random Fields
TLDR
A new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed, which addresses some limitations of HMMs while maintaining many of the aspects which have made them successful.
The Application of Hidden Markov Models in Speech Recognition
TLDR
The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then to describe the various refinements which are needed to achieve state-of-the-art performance.
Acoustic Modeling Using Deep Belief Networks
TLDR
It is shown that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters.
Large Margin Hidden Markov Models for Automatic Speech Recognition
TLDR
This work proposes a learning algorithm based on the goal of margin maximization in continuous density hidden Markov models for automatic speech recognition (ASR) using Gaussian mixture models, and obtains competitive results for phonetic recognition on the TIMIT speech corpus.
The acoustic-modeling problem in automatic speech recognition
TLDR
This thesis is primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space such as R sub N and explores the trade-off between packing a lot of information into such sequences and being able to model them accurately.
Factor analysed hidden Markov models for speech recognition
Improvements to Deep Convolutional Neural Networks for LVCSR
TLDR
A deep analysis comparing limited weight sharing and full weight sharing with state-of-the-art features is conducted and an effective strategy to use dropout during Hessian-free sequence training is introduced.
fMPE: discriminatively trained features for speech recognition
MPE (minimum phone error) is a previously introduced technique for discriminative training of HMM parameters. fMPE applies the same objective function to the features, transforming the data with a
Connectionist Speech Recognition: A Hybrid Approach
From the Publisher: Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous
...
...