• Corpus ID: 7743432

End-to-End Deep Neural Network for Automatic Speech Recognition

@inproceedings{Song2015EndtoEndDN,
  title={End-to-End Deep Neural Network for Automatic Speech Recognition},
  author={Will Song},
  year={2015}
}
  • W. Song
  • Published 2015
  • Computer Science
We investigate the efficacy of deep neural networks on speech recognition. Specifically, we implement an end-to-end deep learning system that utilizes mel-filter bank features to directly output to spoken phonemes without the need of a traditional Hidden Markov Model for decoding. The system will comprise of two variants of neural networks for phoneme recognition. In particular, we utilize convolutional for frame level classification and recurrent architecture with Connectionist Temporal… 

Figures and Tables from this paper

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
TLDR
This paper proposes an end-to-end speech framework for sequence labeling, by combining hierarchical CNNs with CTC directly without recurrent connections, and argues that CNNs have the capability to model temporal correlations with appropriate context information.
Deep Neural Networks for Acoustic Modeling in the Presence of Noise
TLDR
This work has combined two famous architectures of deep learning, the convolutional neural networks (CNN) for acoustic approach and a recurrent architecture with connectionist temporal classification (CTC) for sequential modeling, in order to decode the frames in a sequence forming a word.
End-to-End Speech Recognition Model Based on Deep Learning for Albanian
  • Amarildo Rista, A. Kadriu
  • Computer Science
    2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO)
  • 2021
TLDR
An end-to-end speech recognition model applicable for Albanian language based on Recurrent Neural Network (RNN) architecture, which will be created and implemented in Pytorch tool.
Automatic Speech Recognition using different Neural Network Architectures – A Survey
TLDR
A comparative study regarding the advantages of the architectures discussed during the survey with respect to Word Error Rate, Phone Error Rate etc. in the area of Automatic Speech Recognition (ASR) is concluded.
CNN-Self-Attention-DNN Architecture For Mandarin Recognition
TLDR
This paper proposes CNN-SELF-ATTENTION-DNN CTC architecture which use self-attention to replace RNN and combine with CNN and deep neural network (DNN).
An Overview of End-to-End Automatic Speech Recognition
TLDR
The article focuses on the principles, progress and research hotspots of three different end-to-end models, which are connectionist temporal classification (CTC)-based, recurrent neural network (RNN)-transducer and attention-based, and makes theoretically and experimentally detailed comparisons.
LAS-Transformer: An Enhanced Transformer Based on the Local Attention Mechanism for Speech Recognition
TLDR
This work proposes a local attention Transformer model for speech recognition that combines the high correlation among speech frames and adds local attention based on parametric positional relations to the self-attentive module to improve the generalization of the Transformer for speech sequences of different lengths.
Performance Analysis and Recognition of Speech using Recurrent Neural Network
TLDR
This paper implements the RNN to analyze and recognize speech from the set of spoken words to enable machines to recognize and understand what people are saying.
Improving Automatic Speech Recognition Utilizing Audio-codecs for Data Augmentation
TLDR
This work proposes to expand the training set by using different audio codecs at the data level by using changed bit rate, sampling rate, and bit depth, and reassures variation in the input data without drastically affecting the audio quality.
Automatic Multilingual Speech Recognition
TLDR
This study presents the improvement of LIS-Net model for End-to-End Vietnamese and Chinese ASR system and proposes a new method of coding labels specifically for multiple languages by pagination labels by language.
...
...

References

SHOWING 1-10 OF 18 REFERENCES
Towards End-To-End Speech Recognition with Recurrent Neural Networks
This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. The system is based on a combination of the
Speech recognition with deep recurrent neural networks
TLDR
This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
Hybrid speech recognition with Deep Bidirectional LSTM
TLDR
The hybrid approach with DBLSTM appears to be well suited for tasks where acoustic modelling predominates, and the improvement in word error rate over the deep network is modest, despite a great increase in framelevel accuracy.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
TLDR
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Lexicon-Free Conversational Speech Recognition with Neural Networks
TLDR
An approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure, making it possible to directly train a speech recognizer using errors generated by spoken language understanding tasks.
Deep convolutional neural networks for LVCSR
TLDR
This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and explores the behavior of neural network features extracted from CNNs on a variety of LVCSS tasks, comparing CNNs toDNNs and GMMs.
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
TLDR
This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.
Recent advances in deep learning for speech research at Microsoft
  • L. Deng, Jinyu Li, A. Acero
  • Computer Science
    2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
TLDR
An overview of the work by Microsoft speech researchers since 2009 is provided, focusing on more recent advances which shed light to the basic capabilities and limitations of the current deep learning technology.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
...
...