Self-Attentive Residual Decoder for Neural Machine Translation

@inproceedings{Miculicich2018SelfAttentiveRD,
  title={Self-Attentive Residual Decoder for Neural Machine Translation},
  author={Lesly Miculicich and Nikolaos Pappas and Dhananjay Ram and Andrei Popescu-Belis},
  booktitle={NAACL},
  year={2018}
}
Neural sequence-to-sequence networks with attention have achieved remarkable performance for machine translation. One of the reasons for their effectiveness is their ability to capture relevant source-side contextual information at each time-step prediction through an attention mechanism. However, the target-side context is solely based on the sequence model which, in practice, is prone to a recency bias and lacks the ability to capture effectively non-sequential dependencies among words. To… 

Figures and Tables from this paper

Neural Machine Translation with Decoding History Enhanced Attention
TLDR
A DHEA mechanism is reformulated to render NMT model better at selecting both source-side and target-side information, and achieves comparable results with the state-of-the-art.
Middle-Out Decoding
TLDR
This paper proposes a novel middle-out decoder architecture that begins from an initial middle-word and simultaneously expands the sequence in both directions, and introduces a dual self-attention mechanism that allows us to model complex dependencies between the outputs.
Fast Query-by-Example Speech Search Using Attention-Based Deep Binary Embeddings
TLDR
The proposed approach of self-attentive deep hashing network is effectively trained with three specifically-designed objectives: a penalization term, a triplet loss, and a quantization loss to improve search accuracy and speed for the AWE-based QbE approach in low-resource scenario.
Neural Machine Translation: A Review and Survey
TLDR
This work traces back the origins of modern NMT architectures to word and sentence embeddings and earlier examples of the encoder-decoder network family and concludes with a survey of recent trends in the field.
Reflective Decoding Network for Image Captioning
TLDR
It is shown that vocabulary coherence between words and syntactic paradigm of sentences are also important to generate high-quality image captioning, and the proposed Reflective Decoding Network (RDN) enhances both the long-sequence dependency and position perception of words in a caption decoder.
On the use of prior and external knowledge in neural sequence models
TLDR
This thesis proposes the use of various kinds of prior and external knowledge and present different approaches for integrating them into both training and inference phases of neural sequence models, and introduces a new means for incorporating prior and External knowledge based on the moment matching framework.
Images2Poem: Generating Chinese Poetry from Image Streams
TLDR
An Images2Poem model with a selection mechanism and an adaptive self-attention mechanism for the new multimedia task of generating classical Chinese poetry from image streams that outperforms baselines in different human evaluation metrics.
Ein Vergleich aktueller Deep-Learning-Architekturen zur Prognose von Prozessverhalten
TLDR
Ergebnisse zeigen, dass neuartige Deep-Learning-Architekturen konkurrenzfähige and teilweise bessere Prognosequalitäten aufweisen als die bisher in der Literatur verwendeten Ansätze.
Recurrent Neural Network Techniques: Emphasis on Use in Neural Machine Translation
Natural Language Processing (NLP) is the processing and the representation of human language in a way that accommodate its use in modern computer technology. Several techniques including deep
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
TLDR
It is shown how traditional symbolic statistical machine translation models can still improve neural machine translation while reducing the risk of common pathologies of NMT such as hallucinations and neologisms.
...
...

References

SHOWING 1-10 OF 52 REFERENCES
Effective Approaches to Attention-based Neural Machine Translation
TLDR
A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.
A Context-Aware Recurrent Encoder for Neural Machine Translation
TLDR
This paper proposes a novel context-aware recurrent encoder (CAEncoder), as an alternative to the widely-used bidirectional encoder, such that the future and history contexts can be fully incorporated into the learned source representations.
Memory-enhanced Decoder for Neural Machine Translation
TLDR
The memory in this memory-enhanced RNN decoder is a matrix with pre-determined size designed to better capture the information important for the decoding process at each time step, yielding the best performance achieved with the same training set.
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Neural Machine Translation by Jointly Learning to Align and Translate
TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
End-To-End Memory Networks
TLDR
A neural network with a recurrent attention model over a possibly large external memory that is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings.
Neural Machine Translation with Recurrent Attention Modeling
TLDR
This work improves upon the attention model of Bahdanau et al. (2014) by explicitly modeling the relationship between previous and subsequent attention levels for each word using one recurrent network per input word.
RRA: Recurrent Residual Attention for Sequence Learning
TLDR
The proposed recurrent neural network with residual attention (RRA) yields better performance, faster convergence and more stable training compared to a standard LSTM network and shows highly competitive performance to the state-of-the-art methods.
Long Short-Term Memory-Networks for Machine Reading
TLDR
A machine reading simulator which processes text incrementally from left to right and performs shallow reasoning with memory and attention and extends the Long Short-Term Memory architecture with a memory network in place of a single memory cell, offering a way to weakly induce relations among tokens.
Variational Neural Machine Translation
TLDR
This paper builds a neural posterior approximator conditioned on both the source and the target sides, and equip it with a reparameterization technique to estimate the variational lower bound, and shows that the proposed variational neural machine translation achieves significant improvements over the vanilla neural machinetranslation baselines.
...
...