Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation

  title={Language Tokens: A Frustratingly Simple Approach Improves Zero-Shot Performance of Multilingual Translation},
  author={Muhammad N. ElNokrashy and Amr Hendy and Mohamed Maher and Mohamed Afify and Hany Hassan Awadalla},
This paper proposes a simple yet effective method to improve direct ( X-to-Y ) translation for both cases: zero-shot and when direct data is available. We modify the input tokens at both the encoder and decoder to include signals for the source and target languages. We show a performance gain when training from scratch, or finetuning a pretrained model with the proposed setup. In the experiments, our method shows nearly 10 . 0 BLEU points gain on in-house datasets depending on the checkpoint… 

Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation

This work discretizes the encoder output latent space of multilingual models by assigning encoder states to entries in a codebook, which in effect represents source sentences in a new artificial language, and discovers that using a similar bridge language increases knowledge-sharing among the remaining languages.

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

Experimental results show that LAMASSU not only drastically re-duces the model size but also outperforms monolingual ASR and bilingual ST models.



Subword Segmentation and a Single Bridge Language Affect Zero-Shot Neural Machine Translation

It is found that language-specific subword segmentation results in less subword copying at training time, and leads to better zero-shot performance compared to jointly trained segmentation, and this bias towards English can be effectively reduced with even a small amount of parallel data in some of the non-English pairs.

Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation

It is argued that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics, and overcome this bottleneck via language-specific components and deepening NMT architectures.

Improving Multilingual Translation by Representation and Gradient Regularization

This work proposes a joint approach to regularize NMT models at both representation-level and gradient-level, and demonstrates that this approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.

Three Strategies to Improve One-to-Many Multilingual Translation

This work introduces three strategies to improve one-to-many multilingual translation by balancing the shared and unique features and proposes to divide the hidden cells of the decoder into shared and language-dependent ones.

Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder

This paper presents the first attempts in building a multilingual Neural Machine Translation framework under a unified approach in which the information shared among languages can be helpful in the translation of individual language pairs and points out a novel way to make use of monolingual data with Neural Machine translation.

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language.

Balancing Training for Multilingual Neural Machine Translation

Experiments show the proposed method not only consistently outperforms heuristic baselines in terms of average performance, but also offers flexible control over the performance of which languages are optimized.

The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

The Flores-101 evaluation benchmark is introduced, consisting of 3001 sentences extracted from English Wikipedia and covering a variety of different topics and domains that enables better assessment of model quality on the long tail of low-resource languages, including the evaluation of many-to-many multilingual translation systems.

Facebook AI’s WMT21 News Translation Task Submission

It is described Facebook’s multilingual model submission to the WMT2021 shared task on news translation, an ensemble of dense and sparse Mixture-of-Expert multilingual translation models, followed by finetuning on in-domain news data and noisy channel reranking.

COMET: A Neural Framework for MT Evaluation

This framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality.