Corpus ID: 14345813

Online and Linear-Time Attention by Enforcing Monotonic Alignments

@inproceedings{Raffel2017OnlineAL,
  title={Online and Linear-Time Attention by Enforcing Monotonic Alignments},
  author={Colin Raffel and Minh-Thang Luong and Peter J. Liu and Ron J. Weiss and Douglas Eck},
  booktitle={ICML},
  year={2017}
}
Recurrent neural network models with an attention mechanism have proven to be extremely effective on a wide variety of sequence-to-sequence problems. [...] Key Method Based on the insight that the alignment between input and output sequence elements is monotonic in many problems of interest, we propose an end-to-end differentiable method for learning monotonic alignments which, at test time, enables computing attention online and in linear time. We validate our approach on sentence summarization, machine…Expand
Monotonic Chunkwise Attention
TLDR
Monotonic Chunkwise Attention (MoChA), which adaptively splits the input sequence into small chunks over which soft attention is computed, is proposed and shown that models utilizing MoChA can be trained efficiently with standard backpropagation while allowing online and linear-time decoding at test time. Expand
HUNKWISE A TTENTION
Sequence-to-sequence models with soft attention have been successfully applied to a wide variety of problems, but their decoding process incurs a quadratic time and space cost and is inapplicable toExpand
Monotonic Multihead Attention
TLDR
This paper proposes a new attention mechanism, Monotonic Multihead Attention (MMA), which extends the monotonic attention mechanism to multihead attention and introduces two novel and interpretable approaches for latency control that are specifically designed for multiple attentions heads. Expand
Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing
TLDR
Experimental results on ASR, G2P and machine translation between two languages with similar sentence structures demonstrate that the proposed encoder-decoder model with local monotonic attention could achieve significant performance improvements and reduce the computational complexity in comparison with the one that used the standard global attention architecture. Expand
Enhancing Monotonicity for Robust Autoregressive Transformer TTS
TLDR
This paper proposes a monotonicity enhancing approach with the combining use of Stepwise Monotonic Attention (SMA) and multi-head attention for Transformer based TTS system and can reduce bad cases from 53 of 500 sentences to 1. Expand
Fast and Accurate Reordering with ITG Transition RNN
TLDR
This paper follows the traditional pre-reordering approach to decouple reordering from translation and adds a reordering RNN that shares the input encoder with the decoder, and proposes a transition system to parse input to trees for reordering. Expand
Multi-Scale Alignment and Contextual History for Attention Mechanism in Sequence-to-Sequence Model
TLDR
This paper proposes two ideas to improve sequence-to-sequence model performance by enhancing the attention module and reveals that the proposed extension can improve performance significantly compared to a standard attention baseline. Expand
Latent Alignment and Variational Attention
TLDR
Variational attention networks are considered, alternatives to soft and hard attention for learning latent variable alignment models, with tighter approximation bounds based on amortized variational inference, and methods for reducing the variance of gradients are proposed to make these approaches computationally feasible. Expand
A study of latent monotonic attention variants
TLDR
This paper presents a mathematically clean solution to introduce monotonicity, by introducing a new latent variable which represents the audio position or segment boundaries, and compares several monotonic latent models to the authors' global soft attention baseline. Expand
Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition
TLDR
CTC synchronous training (CTC-ST) is proposed, in which CTC alignments are leveraged as a reference for token boundaries to enable a MoChA model to learn optimal monotonic inputoutput alignments to reduce recognition errors and emission latency simultaneously. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 64 REFERENCES
Sequence to Sequence Transduction with Hard Monotonic Attention
TLDR
An analysis of the learned representations for both hard and soft attention models, shedding light on the features such models extract in order to solve the task. Expand
Online and Linear-Time Attention by Enforcing Monotonic Alignments
A. Algorithms Below are algorithms for the hard monotonic decoding process we used at test time (algorithm 1) and the approach for computing its expected output that we used to train the networkExpand
Learning online alignments with continuous rewards policy gradient
TLDR
This work presents a new method for solving sequence-to-sequence problems using hard online alignments instead of soft offline alignments, which achieves encouraging performance on TIMIT and Wall Street Journal speech recognition datasets. Expand
Online Segment to Segment Neural Transduction
TLDR
An online neural sequence to sequence model that learns to alternate between encoding and decoding segments of the input as it is read that tackles the bottleneck of vanilla encoder-decoders that have to read and memorize the entire input sequence in their fixed-length hidden states. Expand
End-To-End Memory Networks
TLDR
A neural network with a recurrent attention model over a possibly large external memory that is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings. Expand
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. Expand
Towards Better Decoding and Language Model Integration in Sequence to Sequence Models
TLDR
An attention-based seq2seq speech recognition system that directly transcribes recordings into characters is analysed, observing two shortcomings: overconfidence in its predictions and a tendency to produce incomplete transcriptions when language models are used. Expand
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
TLDR
This work proposes several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time. Expand
Efficient Summarization with Read-Again and Copy Mechanism
TLDR
A simple mechanism that first reads the input sequence before committing to a representation of each word is introduced and a simple copy mechanism is proposed that is able to exploit very small vocabularies and handle out-of-vocabulary words. Expand
Effective Approaches to Attention-based Neural Machine Translation
TLDR
A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. Expand
...
1
2
3
4
5
...