Corpus ID: 237532456

Translation Transformers Rediscover Inherent Data Domains

  title={Translation Transformers Rediscover Inherent Data Domains},
  author={Maksym Del and Elizaveta Korotkova and Mark Fishel},
Many works proposed methods to improve the performance of Neural Machine Translation (NMT) models in a domain/multi-domain adaptation scenario. However, an understanding of how NMT baselines represent text domain information internally is still lacking. Here we analyze the sentence representations learned by NMT Transformers and show that these explicitly include the information on text domains, even after only seeing the input sentences without domains labels. Furthermore, we show that this… Expand


Distilling Multiple Domains for Neural Machine Translation
This paper proposes a framework for training a single multi-domain neural machine translation model that is able to translate several domains without increasing inference time or memory usage and shows that this model can improve translation on both high- and low-resource domains over strong multi- domain baselines. Expand
Unsupervised Domain Clusters in Pretrained Language Models
It is shown that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision – suggesting a simple data-driven definition of domains in textual data and proposing domain data selection methods based on such models, which require only a small set of in-domain monolingual data. Expand
Multi-Domain Neural Machine Translation
An approach to neural machine translation (NMT) that supports multiple domains in a single model and allows switching between the domains when translating is presented, showing that this approach results in significant translation quality gains over fine-tuning. Expand
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks
It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Unsupervised Cross-lingual Representation Learning at Scale
It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time. Expand
Neural Machine Translation by Jointly Learning to Align and Translate
It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. Expand
Attention is All you Need
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, finds that it is possible to achieve comparable accuracy to direct subword training from raw sentences. Expand
A Call for Clarity in Reporting BLEU Scores
Pointing to the success of the parsing community, it is suggested machine translation researchers settle upon the BLEU scheme, which does not allow for user-supplied reference processing, and provide a new tool, SACREBLEU, to facilitate this. Expand