Multilingual Machine Translation with Hyper-Adapters

  title={Multilingual Machine Translation with Hyper-Adapters},
  author={Christos Baziotis and Mikel Artetxe and James Cross and Shruti Bhosale},
Multilingual machine translation suffers from negative interference across languages. A common solution is to relax parameter sharing with language-specific modules like adapters. However, adapters of related languages are unable to transfer information, and their total number of parameters becomes prohibitively expensive as the number of languages grows. In this work, we overcome these drawbacks using hyper-adapters —hyper-networks that generate adapters from language and layer embeddings… 



Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Bleu: a Method for Automatic Evaluation of Machine Translation

This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

Monolingual Adapters for Zero-Shot Neural Machine Translation

We propose a novel adapter layer formalism for adapting multilingual models. They are more parameter-efficient than existing adapter layers while obtaining as good or better performance. The layers

Multilingual Translation with Extensible Multilingual Pretraining and Finetuning

This work shows that multilingual translation models can be created through multilingual finetuning, and demonstrates that pretrained models can been extended to incorporate additional languages without loss of performance.

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

This work sets a milestone by building a single massively multilingual NMT model handling 103 languages trained on over 25 billion examples, and demonstrates effective transfer learning ability, significantly improving translation quality of low-resource languages, while keeping high-resource language translation quality on-par with competitive bilingual baselines.

MAD-G: Multilingual Adapter Generation for Efficient Cross-Lingual Transfer

MAD-G (Multilingual ADapter Generation), which contextually generates language adapters from language representations based on typological features, offers substantial benefits for low-resource languages, particularly on the NER task in low- resource African languages.

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

This paper shows that one can learn adapter parameters for all layers and tasks by generating them using shared hypernetworks, which condition on task, adapter position, and layer id in a transformer model.

Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

This work proposes a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a shared wordpiece vocabulary, and introduces an artificial token at the beginning of the input sentence to specify the required target language.

Rethinking the Inception Architecture for Computer Vision

This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.