• Corpus ID: 226965220

Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling

  title={Language Models not just for Pre-training: Fast Online Neural Noisy Channel Modeling},
  author={Shruti Bhosale and Kyra Yee and Sergey Edunov and Michael Auli},
  booktitle={Conference on Machine Translation},
Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks. On the other hand, traditional machine translation has a long history of leveraging unlabeled data through noisy channel modeling. The same idea has recently been shown to achieve strong improvements for neural machine translation. Unfortunately, na ̈ıve noisy channel modeling with modern sequence to sequence models is up to an order of magnitude slower than… 

Figures and Tables from this paper

Amortized Noisy Channel Neural Machine Translation

This paper aims to study if it is possible to build an amortized noisy channel NMT model such that when the authors do greedy decoding during inference, the translation accuracy matches that of BSR in terms of reward, quality, and quality.

Facebook AI’s WMT20 News Translation Task Submission

Facebook AI’s submission to WMT20 shared news translation task focuses on the low resource setting and participates in two language pairs, Tamil and Inuktitut, where there are limited out-of-domain bitext and monolingual data.

Facebook AI’s WMT21 News Translation Task Submission

It is described Facebook’s multilingual model submission to the WMT2021 shared task on news translation, an ensemble of dense and sparse Mixture-of-Expert multilingual translation models, followed by finetuning on in-domain news data and noisy channel reranking.

Survey of Low-Resource Machine Translation

A survey covering the state of the art in low-resource machine translation (MT) research is presented and a description of the techniques evaluated by researchers in several recent shared tasks inLow-resource MT is provided.



Simple and Effective Noisy Channel Modeling for Neural Machine Translation

This work pursues an alternative approach based on standard sequence to sequence models which utilize the entire source, and these models perform remarkably well as channel models, even though they have neither been trained on, nor designed to factor over incomplete target sentences.

The Neural Noisy Channel

Experimental results on abstractive sentence summarisation, morphological inflection, and machine translation show that noisy channel models outperform direct models, and that they significantly benefit from increased amounts of unpaired output data that direct models cannot easily use.

Pre-trained language model representations for language generation

This paper examines different strategies to integrate pre-trained representations into sequence to sequence models and applies it to neural machine translation and abstractive summarization and finds that pre- trained representations are most effective when added to the encoder network which slows inference by only 14%.

Understanding Back-Translation at Scale

This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences, finding that in all but resource poor settings back-translations obtained via sampling or noised beam outputs are most effective.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Improving Neural Machine Translation Models with Monolingual Data

This work pairs monolingual training data with an automatic back-translation, and can treat it as additional parallel training data, and obtains substantial improvements on the WMT 15 task English German, and for the low-resourced IWSLT 14 task Turkish->English.

Simple Fusion: Return of the Language Model

This work investigates an alternative simple method to use monolingual data for NMT training that combines the scores of a pre-trained and fixed language model (LM) with the Scores of a translation model (TM) while the TM is trained from scratch.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Revisiting Self-Training for Neural Sequence Generation

Empirical study on standard machine translation and text summarization benchmarks shows that noisy self-training is able to effectively utilize unlabeled data and improve the performance of the supervised baseline by a large margin.

On Using Monolingual Corpora in Neural Machine Translation

This work investigates how to leverage abundant monolingual corpora for neural machine translation to improve results for En-Fr and En-De translation and extends to high resource languages such as Cs-En and De-En.