Towards End-To-End Speech Recognition with Recurrent Neural Networks
@inproceedings{Graves2014TowardsES, title={Towards End-To-End Speech Recognition with Recurrent Neural Networks}, author={Alex Graves and Navdeep Jaitly}, booktitle={ICML}, year={2014} }
This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. [] Key Result Combining the network with a baseline system further reduces the error rate to 6.7%.
1,739 Citations
END-TO-END SPEECH RECOGNITION USING CONNECTIONIST TEMPORAL CLASSIFICATION
- Computer Science
- 2018
Results show that the use of convolutional input layers is advantages, when compared to dense ones, and suggest that the number of recurrent layers has a significant impact on the results.
End-to-End Deep Neural Network for Automatic Speech Recognition
- Computer Science
- 2015
An end-to-end deep learning system that utilizes mel-filter bank features to directly output to spoken phonemes without the need of a traditional Hidden Markov Model for decoding is implemented.
Lexicon-Free Conversational Speech Recognition with Neural Networks
- Computer ScienceNAACL
- 2015
An approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure, making it possible to directly train a speech recognizer using errors generated by spoken language understanding tasks.
A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition
- Computer ScienceINTERSPEECH
- 2015
This paper studies the RNN encoder-decoder approach for large vocabulary end-to-end speech recognition, whereby an encoder transforms a sequence of acoustic vectors into a sequences of feature representations, from which a decoder recovers asequence of words.
End-to-End Online Speech Recognition with Recurrent Neural Networks
- Computer Science
- 2017
An efficient GPUbased RNN training framework for the truncated backpropagation through time (BPTT) algorithm, which is suitable for online (continuous) training, and an online version of the connectionist temporal classification (CTC) loss computation algorithm, where the original CTC loss is estimated with partial sliding window.
Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer
- Computer Science2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2017
This work investigates training end-to-end speech recognition models with the recurrent neural network transducer (RNN-T) and finds that performance can be improved further through the use of sub-word units ('wordpieces') which capture longer context and significantly reduce substitution errors.
Automatic Speech Recognition using different Neural Network Architectures – A Survey
- Computer Science
- 2016
A comparative study regarding the advantages of the architectures discussed during the survey with respect to Word Error Rate, Phone Error Rate etc. in the area of Automatic Speech Recognition (ASR) is concluded.
End to End Speech Recognition System
- Computer Science
- 2017
A end to end speech recognition system that directly transcribes the audio data with text/phonemes is explained and the system tries to replace the conventional speech recognition pipeline by a single recurrent neural network (RNN) architecture based on the combination of a deep bidirectional LSTM recurrent Neural network architecture and the Connectionist Temporal Classification objective function.
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
We present Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional…
Towards an end-to-end speech recognizer for Portuguese using deep neural networks
- Computer Science
- 2017
This first effort shows that an all-neural highperformance speech recognition system for PT-BR is feasible and achieves a label error rate about 17% higher than commercial systems with a language model.
References
SHOWING 1-10 OF 23 REFERENCES
Speech recognition with deep recurrent neural networks
- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs.
From speech to letters - using a novel neural network architecture for grapheme based ASR
- Computer Science2009 IEEE Workshop on Automatic Speech Recognition & Understanding
- 2009
This work investigates a novel ASR approach using Bidirectional Long Short-Term Memory Recurrent Neural Networks and Connectionist Temporal Classification, which is capable of transcribing graphemes directly and yields results highly competitive with phoneme transcription.
Deep Neural Networks for Acoustic Modeling in Speech Recognition
- Computer Science
- 2012
This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.
Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
- Computer ScienceINTERSPEECH
- 2012
This paper reports results of a DBN-pretrained context-dependent ANN/HMM system trained on two datasets that are much larger than any reported previously, and outperforms the best Gaussian Mixture Model Hidden Markov Model baseline.
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
- Computer ScienceICML
- 2006
This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.
Bidirectional recurrent neural networks
- Computer ScienceIEEE Trans. Signal Process.
- 1997
It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.
Connectionist Speech Recognition: A Hybrid Approach
- Computer Science
- 1993
From the Publisher:
Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous…
Open vocabulary speech recognition with flat hybrid models
- Computer ScienceINTERSPEECH
- 2005
It is demonstrated that, by using a simple flat hybrid model, a well-optimized state-ofthe-art speech recognition system over a wide range of out-of-vocabulary rates can be significantly improved.
Framewise phoneme classification with bidirectional LSTM and other neural network architectures
- Computer ScienceNeural Networks
- 2005
Supervised Sequence Labelling with Recurrent Neural Networks
- Computer ScienceStudies in Computational Intelligence
- 2008
A new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.