Deep Neural Network Acoustic Models for ASR
@inproceedings{Mohamed2014DeepNN, title={Deep Neural Network Acoustic Models for ASR}, author={Abdel-rahman Mohamed}, year={2014} }
Deep Neural Network acoustic models for ASR Abdel-rahman Mohamed Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2014 Automatic speech recognition (ASR) is a key core technology for the information age. ASR systems have evolved from discriminating among isolated digits to recognizing telephone-quality, spontaneous speech, allowing for a growing number of practical applications in various sectors. Nevertheless, there are still serious challenges facing ASR…
Figures and Tables from this paper
figure 2.1 figure 3.1 table 3.2 figure 3.2 figure 3.3 table 3.3 figure 3.4 figure 3.5 figure 3.6 figure 3.7 figure 3.8 figure 4.1 figure 4.10 table 4.1 figure 4.11 figure 4.12 figure 4.13 figure 4.14 figure 4.15 figure 4.16 figure 4.17 figure 4.18 figure 4.19 figure 4.2 figure 4.20 table 4.2 figure 4.21 figure 4.22 figure 4.23 figure 4.24 figure 4.25 figure 4.26 figure 4.27 figure 4.28 figure 4.29 figure 4.3 figure 4.30 table 4.3 figure 4.31 figure 4.32 figure 4.33 figure 4.34 figure 4.35 figure 4.36 figure 4.37 figure 4.38 figure 4.39 figure 4.4 table 4.4 figure 4.5 figure 4.6 figure 4.7 figure 4.8 figure 4.9 figure 5.1 table 5.1 figure 5.2 table 5.2 figure 5.3 table 5.3 table 5.4 figure 6.1 figure 6.2 table 6.2 figure 6.3 table 6.3 figure 6.4 table 6.4 figure 6.5 table 6.5 figure 6.6 figure 6.7 figure 6.8 figure 6.9 table 7.1
27 Citations
Towards Robust Combined Deep Architecture for Speech Recognition : Experiments on TIMIT
- Computer Science
- 2020
This paper proposes to combine CNN, GRU-RNN and DNN in a single deep architecture called Convolutional Gated Recurrent Unit, Deep Neural Network (CGDNN).
Robust End to End Acoustic Model Based on Deep Similarity Network
- Computer Science2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP)
- 2019
A new robust speech recognition model is proposed to tackle performance degradation in the presence of acoustic interference, and the idea of parameter sharing between clean speeches and noisy ones is suggested to improve the generalization capability of the model.
Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages
- Computer ScienceINTERSPEECH
- 2020
This paper presents a set of experiments to investigate the impact of domain mismatch on the performance of neural LID systems for a subset of six Slavic languages across two domains and examines two low-level signal descriptors (spectral and cepstral features) for this task.
Improve Data Utilization with Two-stage Learning in CNN-LSTM-based Voice Activity Detection
- Computer Science2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
- 2019
This work proposes a two-stage training strategy that achieves over 2.89% relative improvement than the original CLDNN on noise matched condition and over 1.07% on unmatched condition and shows that the method has obvious advantages in discriminative ability and generalization ability than compared approaches in different scale of training data, especially in small datasets.
A Comparative Study of Features for Acoustic Cough Detection Using Deep Architectures*
- Computer Science2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
- 2019
Although MFCC performance is improved by sinusoidal liftering, STFT and MFB lead to better results, an improvement exceeding 7% in the area under the receiver operating characteristic curve across all classifiers is achieved.
Hybrid context dependent CD-DNN-HMM keywords spotting on continuous speech
- Computer Science, Economics2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)
- 2017
A systematic procedure to implement two-stage based keywords spotting system (KWS) using a CD-DNN-HMM model built with the Kaldi toolkit and the classification and regression tree (CART) implemented with the software MATLAB.
Acoustic scene classification using auditory datasets
- Computer ScienceArXiv
- 2021
The project conducted to classify some pre-defined acoustic scene is discussed and explained, and improvised data analysis and data augmentation for audio datasets like frequency masking and random frequency-time stretching are used and explained.
Deep learning for spoken language identification
- Education
- 2020
Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fiDeep learning for spoken language identification School School of Science Master’s programme Computer, Communication and Information Sciences Major Computer Science Code SCI3042.
Speech-Based CALL System to Evaluate the Meaning and Grammar Errors in English Spoken Utterance
- Computer Science
- 2019
The universal sentence encoder was used to encode each sentence into 512-dimensional vector to represent the semantic features of the response, and a binary embedding approach to produce 438 binary features vector from the response.
A depthwise separable convolutional neural network for keyword spotting on an embedded system
- Computer ScienceEURASIP J. Audio Speech Music. Process.
- 2020
A keyword spotting algorithm implemented on an embedded system using a depthwise separable convolutional neural network classifier is reported, finding that quantization of pre-trained networks using mixed and dynamic fixed point principles could reduce the memory footprint and computational requirements without lowering classification accuracy.
References
SHOWING 1-10 OF 98 REFERENCES
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
- Computer ScienceIEEE Signal Processing Magazine
- 2012
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Speech Recognition Using Augmented Conditional Random Fields
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2009
A new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed, which addresses some limitations of HMMs while maintaining many of the aspects which have made them successful.
The Application of Hidden Markov Models in Speech Recognition
- Computer ScienceFound. Trends Signal Process.
- 2007
The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then to describe the various refinements which are needed to achieve state-of-the-art performance.
Acoustic Modeling Using Deep Belief Networks
- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2012
It is shown that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters.
Large Margin Hidden Markov Models for Automatic Speech Recognition
- Computer ScienceNIPS
- 2006
This work proposes a learning algorithm based on the goal of margin maximization in continuous density hidden Markov models for automatic speech recognition (ASR) using Gaussian mixture models, and obtains competitive results for phonetic recognition on the TIMIT speech corpus.
The acoustic-modeling problem in automatic speech recognition
- Computer Science
- 1987
This thesis is primarily concerned with the use of hidden Markov models to model sequences of feature vectors which lie in a continuous space such as R sub N and explores the trade-off between packing a lot of information into such sequences and being able to model them accurately.
Factor analysed hidden Markov models for speech recognition
- Computer ScienceComput. Speech Lang.
- 2004
Improvements to Deep Convolutional Neural Networks for LVCSR
- Computer Science2013 IEEE Workshop on Automatic Speech Recognition and Understanding
- 2013
A deep analysis comparing limited weight sharing and full weight sharing with state-of-the-art features is conducted and an effective strategy to use dropout during Hessian-free sequence training is introduced.
fMPE: discriminatively trained features for speech recognition
- Computer ScienceProceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.
- 2005
MPE (minimum phone error) is a previously introduced technique for discriminative training of HMM parameters. fMPE applies the same objective function to the features, transforming the data with a…
Connectionist Speech Recognition: A Hybrid Approach
- Computer Science
- 1993
From the Publisher:
Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous…