Prompt Gating: A Parameter Efficient Tuning Method for Zero-Shot Multi-Source Translation

  title={Prompt Gating: A Parameter Efficient Tuning Method for Zero-Shot Multi-Source Translation},
  author={Xuancheng Huang and Zijun Liu and Peng Li and Maosong Sun and Yang Liu},
Multi-source translation (MST), which typi-cally receives multiple source sentences of the same meaning in different languages, has been shown superior to single-source translation. As the quantity of multi-source parallel data is limited, taking full advantage of single-source data and limited multi-source data to make models perform well when receiving as many as possible sources remains a challenge. Unlike previous work mostly devoted to supervised scenarios, we focus on zero-shot MST… 



Transfer Learning for Sequence Generation: from Single-source to Multi-source

A two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks is proposed.

Input Combination Strategies for Multi-Source Transformer Decoder

This paper proposes four different input combination strategies for the encoder-decoder attention: serial, parallel, flat, and hierarchical, and evaluates the methods on tasks of multimodal translation and translation with multiple source languages.

Multilingual Denoising Pre-training for Neural Machine Translation

Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a

Multi-Source Neural Machine Translation with Missing Data

This study focuses on the use of incomplete multilingual corpora in multi-encoder NMT and mixture of NMT experts and examines a very simple implementation where missing source translations are replaced by a special symbol.

Ensemble Learning for Multi-Source Neural Machine Translation

This paper proposes several methods with different degrees of parameterization to combine individual predictions of NMT systems so that they mutually compensate for each other’s mistakes and improve overall performance, finding that the biggest improvements can be obtained from a context-dependent weighting scheme for multi-source ensembles.

Towards a Unified View of Parameter-Efficient Transfer Learning

This paper re-frames state-of-the-art parameter-efficient transfer learning methods as modifications to specific hidden states in pretrained models, and defines a set of design dimensions along which different methods vary, achieving comparable results to fine-tuning all parameters on all four tasks.

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix-tuning is proposed, a lightweight alternative to fine- Tuning for natural language generation tasks, which keeps language model parameters frozen and instead optimizes a sequence of continuous task-specific vectors, which is called the prefix.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Parameter-Efficient Transfer Learning for NLP

To demonstrate adapter's effectiveness, the recently proposed BERT Transformer model is transferred to 26 diverse text classification tasks, including the GLUE benchmark, and adapter attain near state-of-the-art performance, whilst adding only a few parameters per task.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.