Simplifying very deep convolutional neural network architectures for robust speech recognition
@article{Rownicka2017SimplifyingVD, title={Simplifying very deep convolutional neural network architectures for robust speech recognition}, author={Joanna Rownicka and Steve Renals and Peter Bell}, journal={2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)}, year={2017}, pages={236-243} }
Very deep convolutional neural networks (VDCNNs) have been successfully used in computer vision. More recently VDCNNs have been applied to speech recognition, using architectures adopted from computer vision. In this paper, we experimentally analyse the role of the components in VDCNN architectures for robust speech recognition. We have proposed a number of simplified VDCNN architectures, taking into account the use of fully-connected layers and down-sampling approaches. We have investigated…
9 Citations
Evaluation of Modified Deep Neural Network Architecture Performance for Speech Recognition
- Computer Science2018 International Conference on Intelligent and Advanced System (ICIAS)
- 2018
Four different Deep Neural Network (DNN) architectures are proposed and comparison is made between these four proposed DNN architectures in terms of accuracy and training time and modified triangular architecture gave the highest accuracy as compared to other architectures.
Multi-Scale Octave Convolutions for Robust Speech Recognition
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
It is argued that octave convolutions likewise improve the robustness of learned representations due to the use of average pooling in the lower resolution group, acting as a low-pass filter, while improving the computational efficiency of the CNN acoustic models.
Analyzing Deep CNN-Based Utterance Embeddings for Acoustic Model Adaptation
- Computer Science2018 IEEE Spoken Language Technology Workshop (SLT)
- 2018
It is found that deep CNN embeddings outperform DNNembeddings for acoustic model adaptation and auxiliary features based on deep CNN embedded features result in similar word error rates to i-vectors.
Embeddings for DNN Speaker Adaptive Training
- Computer Science2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2019
The performance for speaker recognition of a given representation is not correlated with its ASR performance; in fact, ability to capture more speech attributes than just speaker identity was the most important characteristic of the embed-dings for efficient DNN-SAT ASR.
Novel Demodulation-Based Features using Classifier-level Fusion of GMM and CNN for Replay Detection
- Computer Science2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)
- 2018
The architecture with max-pooling when replaced with convolutional layer along with FC layers had performed relatively better on most of the AM-FM feature sets compared to other CNNs, and the ESA-based AM features performed better as AM do not have more fluctuation as FM have during models training.
Automatic Database Segmentation using Hybrid Spectrum -Visual Approach
- Computer ScienceThe Egyptian Journal of Language Engineering
- 2021
A novel method of segmentation of speech phonemes, where the proposed strategy helps in the selection of appropriate feature extraction technique for speech segmentation has the potential to be used in applications such as automatic speech recognition and automatic language identification.
An Art of Speech Recognition: A Review
- Computer Science2019 2nd International Conference on Signal Processing and Communication (ICSPC)
- 2019
This paper provides literature review on the various feature extraction and classification methods used in speech recognition system.
Design of Countermeasures for Replay Spoof Speech Attack
- Education
- 2018
The following is a list of principal Symbols and Acronyms used in medicine, as well as some of their applications in other fields.
Simplifying very deep convolutional neural network architectures for robust speech recognition
- Computer Science2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2017
A proposed model consisting solely of convolutional (conv) layers, and without any fully-connected layers, achieves a lower word error rate on Aurora 4 compared to other VDCNN architectures typically used in speech recognition.
References
SHOWING 1-10 OF 30 REFERENCES
Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2016
The proposed very deep CNNs can significantly reduce word error rate (WER) for noise robust speech recognition and are competitive with the long short-term memory recurrent neural networks (LSTM-RNN) acoustic model.
Very deep convolutional neural networks for robust speech recognition
- Computer Science2016 IEEE Spoken Language Technology Workshop (SLT)
- 2016
The extension and optimisation of previous work on very deep convolutional neural networks for effective recognition of noisy speech in the Aurora 4 task are described and it is shown that state-level weighted log likelihood score combination in a joint acoustic model decoding scheme is very effective.
Convolutional Neural Networks for Speech Recognition
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2014
It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Advances in Very Deep Convolutional Neural Networks for LVCSR
- Computer ScienceINTERSPEECH
- 2016
This paper proposes a new CNN design without timepadding and without timepooling, which is slightly suboptimal for accuracy, but has two significant advantages: it enables sequence training and deployment by allowing efficient convolutional evaluation of full utterances, and, it allows for batch normalization to be straightforwardly adopted to CNNs on sequence data.
Very deep multilingual convolutional neural networks for LVCSR
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
A very deep convolutional network architecture with up to 14 weight layers, with small 3×3 kernels, inspired by the VGG Imagenet 2014 architecture is introduced and multilingual CNNs with multiple untied layers are introduced.
Deep Convolutional Neural Networks for Large-scale Speech Tasks
- Computer ScienceNeural Networks
- 2015
Improvements to Deep Convolutional Neural Networks for LVCSR
- Computer Science2013 IEEE Workshop on Automatic Speech Recognition and Understanding
- 2013
A deep analysis comparing limited weight sharing and full weight sharing with state-of-the-art features is conducted and an effective strategy to use dropout during Hessian-free sequence training is introduced.
An analysis of convolutional neural networks for speech recognition
- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015
By visualizing the localized filters learned in the convolutional layer, it is shown that edge detectors in varying directions can be automatically learned and it is established that the CNN structure combined with maxout units is the most effective model under small-sizing constraints for the purpose of deploying small-footprint models to devices.
Convolutional Neural Networks for Distant Speech Recognition
- Computer ScienceIEEE Signal Processing Letters
- 2014
This work investigates convolutional neural networks for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM), and proposes a channel-wise convolution with two-way pooling.
Deep convolutional neural networks for LVCSR
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and explores the behavior of neural network features extracted from CNNs on a variety of LVCSS tasks, comparing CNNs toDNNs and GMMs.