Dense Information Flow for Neural Machine Translation

@inproceedings{Shen2018DenseIF,
  title={Dense Information Flow for Neural Machine Translation},
  author={Yanyao Shen and Xu Tan and Di He and Tao Qin and Tie-Yan Liu},
  booktitle={NAACL},
  year={2018}
}
Recently, neural machine translation has achieved remarkable progress by introducing well-designed deep neural networks into its encoder-decoder framework. From the optimization perspective, residual connections are adopted to improve learning performance for both encoder and decoder in most of these deep architectures, and advanced attention connections are applied as well. Inspired by the success of the DenseNet model in computer vision problems, in this paper, we propose a densely connected… 

Figures and Tables from this paper

Improving Neural Machine Translation Model with Deep Encoding Information
TLDR
A novel neural machine translation model which can fully exploit the deep encoding information is proposed and three different aggregation strategies including parallel layer, multi-layer, and dynamic layer encoding information aggregations are designed.
Multiscale Collaborative Deep Models for Neural Machine Translation
TLDR
This paper presents a MultiScale Collaborative (MSC) framework to ease the training of NMT models that are substantially deeper than those used previously and provides empirical evidence showing that the MSC nets are easy to optimize and can obtain improvements of translation quality from considerably increased depth.
Residual Tree Aggregation of Layers for Neural Machine Translation
TLDR
A residual tree aggregation of layers for Transformer (RTAL), which helps to fuse information across layers by constructing a post-order binary tree.
Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation
TLDR
The concept of layer-wise coordination for NMT is proposed, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer, gradually from low level to high level.
Dynamic Layer Aggregation for Neural Machine Translation with Routing-by-Agreement
TLDR
This paper proposes to use routing-by-agreement strategies to aggregate layers dynamically and shows that the proposed approach consistently outperforms the strong baseline model and a representative static aggregation model.
Efficient Bidirectional Neural Machine Translation
TLDR
An efficient method to generate a sequence in both left-to-right and right- to-left manners using a single encoder and decoder, combining the advantages of both generation directions is proposed.
Layer-Wise Multi-View Decoding for Improved Natural Language Generation
TLDR
This work proposes layer-wise multi-view decoding, where for each decoder layers, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences.
Learning Source Phrase Representations for Neural Machine Translation
TLDR
This paper proposes an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations and incorporates the generated phrase representations into the Transformer translation model to enhance its ability to capture long-distance relationships.
Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation
TLDR
An empirical study on the encoder and the decoder in NMT is conducted, taking Transformer as an example, and it is found that theDecoder handles an easier task than the encoding in N MT.
Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input
TLDR
This paper proposes two methods to enhance the decoder inputs so as to improve NAT models, one directly leverages a phrase table generated by conventional SMT approaches to translate source tokens to target tokens, and the other transforms source-side word embeddings to target-side words through sentence-level alignment and word-level adversary learning.
...
1
2
3
...

References

SHOWING 1-10 OF 24 REFERENCES
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Neural Machine Translation by Jointly Learning to Align and Translate
TLDR
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
TLDR
GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.
Convolutional Sequence to Sequence Learning
TLDR
This work introduces an architecture based entirely on convolutional neural networks, which outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT-French translation at an order of magnitude faster speed, both on GPU and CPU.
Depthwise Separable Convolutions for Neural Machine Translation
TLDR
A new architecture inspired by Xception and ByteNet is introduced, called SliceNet, which enables a significant reduction of the parameter count and amount of computation needed to obtain results like ByteNet, and, with a similar parameter count, achieves new state-of-the-art results.
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation
TLDR
Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases.
Improving Neural Machine Translation Models with Monolingual Data
TLDR
This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English.
Densely Connected Convolutional Networks
TLDR
The Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion, and has several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation
TLDR
The proposed DenseNets approach achieves state-of-the-art results on urban scene benchmark datasets such as CamVid and Gatech, without any further post-processing module nor pretraining, and has much less parameters than currently published best entries for these datasets.
Towards Neural Phrase-based Machine Translation
TLDR
This paper explicitly models the phrase structures in output sequences using Sleep-WAke Networks (SWAN), a recently proposed segmentation-based sequence modeling method and introduces a new layer to perform (soft) local reordering of input sequences to mitigate the monotonic alignment requirement of SWAN.
...
1
2
3
...