Transfer Learning for Sequence Generation: from Single-source to Multi-source

  title={Transfer Learning for Sequence Generation: from Single-source to Multi-source},
  author={Xuancheng Huang and Jingfang Xu and Maosong Sun and Yang Liu},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
Multi-source sequence generation (MSG) is an important kind of sequence generation tasks that takes multiple sources, including automatic post-editing, multi-source translation, multi-document summarization, etc. As MSG tasks suffer from the data scarcity problem and recent pretrained models have been proven to be effective for low-resource downstream tasks, transferring pretrained sequence-to-sequence models to MSG tasks is essential. Although directly finetuning pretrained models on MSG tasks… 

Figures and Tables from this paper

Prompt Gating: A Parameter Efficient Tuning Method for Zero-Shot Multi-Source Translation

This work proposes a simple yet effective parameter efficient method, named Prompt Gating, which appends prompts to the model inputs and attaches gates on the extended hidden states for each encoder layer on MST, which shows strong zero-shot transferability and remarkable compositionality.

Stance Detection with a Multi-Target Adversarial Attention Network

An adversarial attention network is proposed to integrate multi-target data by detecting and connecting topic and sentiment information and the effectiveness of the proposed model is demonstrated, which indicates the importance of the topic and the sentiment information for stance detection using multi- target data.

An Extensible Plug-and-Play Method for Multi-Aspect Controllable Text Generation

  • Xuancheng HuangZijun LiuPeng LiTao LiMaosong SunYang Liu
  • Computer Science
  • 2022
This work provides a theoretical lower bound for the interference and empirically found that the interference grows with the number of layers where prefixes are inserted and proposes using trainable gates to normalize the intervention of prefixes to restrain the growing interference.

MASS: Masked Sequence to Sequence Pre-training for Language Generation

This work proposes MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks, which achieves the state-of-the-art accuracy on the unsupervised English-French translation, even beating the early attention-based supervised model.

Input Combination Strategies for Multi-Source Transformer Decoder

This paper proposes four different input combination strategies for the encoder-decoder attention: serial, parallel, flat, and hierarchical, and evaluates the methods on tasks of multimodal translation and translation with multiple source languages.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks.

Multi-source Neural Automatic Post-Editing: FBK’s participation in the WMT 2017 APE shared task

The multi-source neural machine translation (NMT) system submitted by FBK to the WMT 2017 APE shared task is presented, which resulted in the best system submission for this round of the APE share task for both en-de and de-en language directions.

Incorporating BERT into Parallel Sequence Decoding with Adapters

This paper takes two different BERT models as the encoder and decoder respectively, and fine-tuning them by introducing simple and lightweight adapter modules, which are inserted between BERT layers and tuned on the task-specific dataset, resulting in a flexible and efficient model.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Multilingual Denoising Pre-training for Neural Machine Translation

Abstract This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART—a

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Incorporating BERT into Neural Machine Translation

A new algorithm named BERT-fused model is proposed, in which BERT is first used to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms.

Exploring and Predicting Transferability across NLP Tasks

The results show that transfer learning is more beneficial than previously thought, especially when target task data is scarce, and can improve performance even when the source task is small or differs substantially from the target task.