Corpus ID: 1921173

Attention-Based Models for Speech Recognition

@inproceedings{Chorowski2015AttentionBasedMF,
  title={Attention-Based Models for Speech Recognition},
  author={Jan Chorowski and Dzmitry Bahdanau and Dmitriy Serdyuk and Kyunghyun Cho and Yoshua Bengio},
  booktitle={NIPS},
  year={2015}
}
Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks including machine translation, handwriting synthesis [1,2] and image caption generation [3. [...] Key Method We offer a qualitative explanation of this failure and propose a novel and generic method of adding location-awareness to the attention mechanism to alleviate this issue.Expand
A Time-Restricted Self-Attention Layer for ASR
TLDR
This paper applies a restricted self-attention mechanism (with multiple heads) to speech recognition, and tries introducing attention layers into TDNN architectures, and replacing LSTM layers with attention layers in TDNN+LSTM architectures. Expand
Unidirectional Memory-Self-Attention Transducer for Online Speech Recognition
TLDR
The experiments demonstrate that the proposed models improve WER results than Restricted-Self-Attention models by 13.5% on WSJ and 7.1% on SWBD datasets relatively, and without much computation costs increase. Expand
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
  • C. Chiu, T. Sainath, +11 authors M. Bacchiani
  • Computer Science, Engineering
  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2018
TLDR
A variety of structural and optimization improvements to the Listen, Attend, and Spell model are explored, which significantly improve performance and a multi-head attention architecture is introduced, which offers improvements over the commonly-used single- head attention. Expand
End-to-end attention-based large vocabulary speech recognition
TLDR
This work investigates an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels. Expand
Attention-Based End-to-End Speech Recognition on Voice Search
TLDR
This paper uses character embedding to deal with the large vocabulary of Mandarin speech recognition and compares two attention mechanisms and uses attention smoothing to cover long context in the attention model. Expand
An Analysis of Local Monotonic Attention Variants
TLDR
A simple technique to implement windowed attention, which can be applied on top of an existing global attention model, and it is shown that the proposed model can be trained from random initialization and achieve results comparable to the global attention baseline. Expand
Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
TLDR
This work designs an alternative student network that, instead of using a thinner or a shallower model, keeps the original architecture of the teacher model but with shorter sequences (few encoder and decoder states) and learns to mimic the same alignment between the current input short speech segments and the transcription. Expand
Neural Incremental Speech Recognition Through Attention Transfer
One of the challenges that have to be confronted to achieve a simultaneous speech translation system is the incremental ASR (ISR) development. Hidden Markov model (HMM) ASR [10, 5] performs aExpand
Attention-Based End-to-End Speech Recognition in Mandarin
TLDR
This paper explores the use of attention-based encoder-decoder model for Mandarin speech recognition and achieves the first promising result and reduces the source sequence length by skipping frames and regularize the weights for better generalization and convergence. Expand
Encoder Transfer for Attention-based Acoustic-to-word Speech Recognition
Acoustic-to-word speech recognition based on attention-based encoder-decoder models achieves better accuracies with much lower latency than the conventional speech recognition systems. However,Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 42 REFERENCES
Speech recognition with deep recurrent neural networks
TLDR
This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. Expand
Deep Speech: Scaling up end-to-end speech recognition
TLDR
Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Expand
Combining time- and frequency-domain convolution in convolutional neural network-based phone recognition
  • L. Tóth
  • Computer Science
  • 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
TLDR
The two network architectures, convolution along the frequency axis and time-domain convolution, can be readily combined and report an error rate of 16.7% on the TIMIT phone recognition task, a new record on this dataset. Expand
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. Expand
Towards End-To-End Speech Recognition with Recurrent Neural Networks
This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. The system is based on a combination of theExpand
End-To-End Memory Networks
TLDR
A neural network with a recurrent attention model over a possibly large external memory that is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings. Expand
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
TLDR
Initial results demonstrate that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset. Expand
Sequence Transduction with Recurrent Neural Networks
  • A. Graves
  • Computer Science, Mathematics
  • ArXiv
  • 2012
TLDR
This paper introduces an end-to-end, probabilistic sequence transduction system, based entirely on RNNs, that is in principle able to transform any input sequence into any finite, discrete output sequence. Expand
Neural Machine Translation by Jointly Learning to Align and Translate
TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. Expand
The Application of Hidden Markov Models in Speech Recognition
TLDR
The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then to describe the various refinements which are needed to achieve state-of-the-art performance. Expand
...
1
2
3
4
5
...