Improving Neural Machine Translation Models with Monolingual Data

  title={Improving Neural Machine Translation Models with Monolingual Data},
  author={Rico Sennrich and Barry Haddow and Alexandra Birch},
Neural Machine Translation (NMT) has obtained state-of-the art performance for several language pairs, while only using parallel data for training. Target-side monolingual data plays an important role in boosting fluency for phrase-based statistical machine translation, and we investigate the use of monolingual data for NMT. In contrast to previous work, which combines NMT models with separately trained language models, we note that encoder-decoder NMT architectures already have the capacity to… 

Figures and Tables from this paper

Joint Training for Neural Machine Translation Models with Monolingual Data

Experimental results on Chinese-English and English-German translation tasks show that the proposed approach can simultaneously improve translation quality of source-to-target and target- to-source models, significantly outperforming strong baseline systems which are enhanced with monolingual data for model training including back-translation.

Semi-Supervised Learning for Neural Machine Translation

This work proposes a semi-supervised approach for training NMT models on the concatenation of labeled and unlabeled monolingual corpora data, in which the source- to-target and target-to-source translation models serve as the encoder and decoder, respectively.

Multi-task Learning for Multilingual Neural Machine Translation

This work proposes a multi-task learning (MTL) framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data, and shows the effectiveness of MTL over pre-training approaches for both NMT and cross-lingual transfer learning NLU tasks.

Improving Neural Machine Translation on resource-limited pairs using auxiliary data of a third language

The experiments show that, in some cases, the proposed approach to subword-units performs better than BPE (Byte pair encoding) and that auxiliary language-pairs and monolingual data can help improve the performance of languages with limited resources.

Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning

This work proposes to modify the decoder in a neural sequence-to-sequence model to enable multi-task learning for two strongly related tasks: target-side language modeling and translation.

Semi-Supervised Neural Machine Translation with Language Models

This work proposes an approach of transferring knowledge from separately trained language models to translation systems and investigates several techniques to improve translation quality when there is a lack of parallel data and computational resources.

Exploiting Monolingual Data at Scale for Neural Machine Translation

This work studies how to use both the source-side and target-side monolingual data for NMT, and proposes an effective strategy leveraging both of them.

Can Monolingual Embeddings Improve Neural Machine Translation?

This paper presents ways to directly feed a NMT network with external word embeddings trained on monolingual source data, thus enabling a virtually infinite source vocabulary.

Exploiting Source-side Monolingual Data in Neural Machine Translation

Two approaches to make full use of the sourceside monolingual data in NMT are proposed using the self-learning algorithm to generate the synthetic large-scale parallel data for NMT training and the multi-task learning framework using two NMTs to predict the translation and the reordered source-side monolingUAL sentences simultaneously.



On Using Monolingual Corpora in Neural Machine Translation

This work investigates how to leverage abundant monolingual corpora for neural machine translation to improve results for En-Fr and En-De translation and extends to high resource languages such as Cs-En and De-En.

Domain Adaptation for Statistical Machine Translation with Monolingual Resources

This work proposes to synthesize a bilingual corpus by translating the monolingual adaptation data into the counterpart language by exploiting large but cheapmonolingual in-domain data, either in the source or in the target language.

Stanford Neural Machine Translation Systems for Spoken Language Domains

This work further explores the effectiveness of NMT in spoken language domains by participating in the MT track of the IWSLT 2015 and demonstrates that using an existing NMT framework can achieve competitive results in the aforementioned scenarios when translating from English to German and Vietnamese.

Neural Machine Translation by Jointly Learning to Align and Translate

It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

On Using Very Large Target Vocabulary for Neural Machine Translation

It is shown that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary.

Investigations on large-scale lightly-supervised training for statistical machine translation.

This paper proposes to apply lightly-supervised training to produce additional parallel data to translate large amounts of monolingual data with an SMT system, and to use those as additional training data.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Montreal Neural Machine Translation Systems for WMT’15

The Montreal Institute for Learning Algorithms (MILA) submission to WMT’15 is to evaluate this new approach to NMT on a greater variety of language pairs, using the RNNsearch architecture, which adds an attention mechanism to the encoderdecoder.

Investigations on Translation Model Adaptation Using Monolingual Data

Improvements of up to 0.5 BLEU were observed with respect to a very competitive baseline trained on more than 280M words of human translated parallel data.

Effective Approaches to Attention-based Neural Machine Translation

A global approach which always attends to all source words and a local one that only looks at a subset of source words at a time are examined, demonstrating the effectiveness of both approaches on the WMT translation tasks between English and German in both directions.