Modeling Coverage for Neural Machine Translation

@article{Tu2016ModelingCF,
  title={Modeling Coverage for Neural Machine Translation},
  author={Zhaopeng Tu and Zhengdong Lu and Yang Liu and Xiaohua Liu and Hang Li},
  journal={arXiv: Computation and Language},
  year={2016}
}
Attention mechanism has enhanced state-of-the-art Neural Machine Translation (NMT) by jointly learning to align and translate. It tends to ignore past alignment information, however, which often leads to over-translation and under-translation. To address this problem, we propose coverage-based NMT in this paper. We maintain a coverage vector to keep track of the attention history. The coverage vector is fed to the attention model to help adjust future attention, which lets NMT system to… 

Figures and Tables from this paper

An Effective Coverage Approach for Attention-based Neural Machine Translation
TLDR
This work proposes a simple approach using coverage techniques that can be used in conjunction with a diverse number of attention models to improve the quality of translation on both English -Vietnamese and Japanese - Vietnamese language pairings.
History attention for source-target alignment in neural machine translation
TLDR
A history attention structure that takes advantage of translated information is proposed that easily captures history information, helps model alleviate the memory vanishing problem introduced by long sentences and avoid focusing on one local part.
Temporal Attention Model for Neural Machine Translation
TLDR
This work proposes a novel mechanism to address some of these limitations and improve the NMT attention that memorizes the alignments temporally and modulates the attention with the accumulated temporal memory, as the decoder generates the candidate translation.
Neural Machine Translation with Supervised Attention
TLDR
Experiments on two Chinese-to-English translation tasks show that the supervised attention mechanism yields better alignments leading to substantial gains over the standard attention based NMT.
Generating Alignments Using Target Foresight in Attention-Based Neural Machine Translation
TLDR
This work proposes an extension of the attention-based NMT model that introduces target information into the attention mechanism to produce high-quality alignments and halves the Aer with an absolute improvement of 19.1% Aer.
Learning When to Attend for Neural Machine Translation
TLDR
A novel attention model is proposed that has the capability of determining when a decoder should attend to source words and when it should not and achieves an improvement of 0.8 BLEU score over a state-of-the-art baseline.
Fine-grained attention mechanism for neural machine translation
Look-Ahead Attention for Generation in Neural Machine Translation
TLDR
A novel look-ahead attention mechanism for generation in NMT, which aims at directly capturing the dependency relationship between target words, is proposed.
A Simple and Effective Approach to Coverage-Aware Neural Machine Translation
TLDR
This work offers a simple and effective method to seek a better balance between model confidence and length preference for Neural Machine Translation (NMT), which is robust to large beam sizes and not well studied in previous work.
Interactive Attention for Neural Machine Translation
TLDR
A new attention mechanism is proposed, called INTERACTIVE ATTENTION, which models the interaction between the decoder and the representation of source sentence during translation by both reading and writing operations and can achieve significant improvements over both the previous attention-based NMT baseline and some state-of-the-art variants of attention- based NMT.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
Agreement-Based Joint Training for Bidirectional Attention-Based Neural Machine Translation
TLDR
This work proposes agreement-based joint training for bidirectional attention-based end-to-end neural machine translation, which encourages the two complementary models to agree on word alignment matrices on the same training data.
Effective Approaches to Attention-based Neural Machine Translation
TLDR
A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.
Incorporating Structural Alignment Biases into an Attentional Neural Translation Model
TLDR
The attentional neural translation model is extended to include structural biases from word based alignment models, including positional bias, Markov conditioning, fertility and agreement over translation directions.
Neural Machine Translation by Jointly Learning to Align and Translate
TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Minimum Risk Training for Neural Machine Translation
TLDR
Experiments show that the proposed minimum risk training approach achieves significant improvements over maximum likelihood estimation on a state-of-the-art neural machine translation system across various languages pairs.
Recurrent Continuous Translation Models
We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences
Implicit Distortion and Fertility Models for Attention-based Encoder-Decoder NMT Model
TLDR
This work proposes new variations of attention-based encoder-decoder and compares them with other models on machine translation to resolve problems caused by the lack of distortion and fertility models.
On Using Very Large Target Vocabulary for Neural Machine Translation
TLDR
It is shown that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary.
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Contrastive Unsupervised Word Alignment with Non-Local Features
TLDR
This work proposes a contrastive approach that aims to differentiate observed training examples from noises and uses top-n alignments to approximate the expectations with respect to posterior distributions, which allows for efficient and accurate calculation of expectations of non-local features.
...
1
2
3
4
...