Analyzing Multi-Task Learning for Abstractive Text Summarization

  title={Analyzing Multi-Task Learning for Abstractive Text Summarization},
  author={Frederic Kirstein and Jan Philip Wahle and Terry Ruas and Bela Gipp},
Despite the recent success of multi-task learning and pre-finetuning for natural language understanding, few works have studied the effects of task families on abstractive text summarization. Task families are a form of task grouping during the pre-finetuning stage to learn common skills, such as reading comprehension. To close this gap, we analyze the influence of multi-task learning strategies using task families for the English abstractive text summarization task. We group tasks into one of… 



Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning

ExMIX (Extreme Mixture): a massive collection of 107 supervised NLP tasks across diverse domains and task-families is introduced, and a model pre-trained using a multi-task objective of self-supervised span denoising and supervised EXMIX is proposed.

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

This work proposes pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective, PEGASUS, and demonstrates it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores.

A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks

A hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers.

Abstractive Summarization of Reddit Posts with Multi-level Memory Networks

This work collects Reddit TIFU dataset, consisting of 120K posts from the online discussion forum Reddit, and proposes a novel abstractive summarization model named multi-level memory networks (MMN), equipped with multi- level memory to store the information of text from different levels of abstraction.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

A Neural Attention Model for Abstractive Sentence Summarization

This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence.

The Factual Inconsistency Problem in Abstractive Text Summarization: A Survey

This survey focuses on presenting a comprehensive review of fact-specific evaluation methods and text summarization models proposed by Seq2Seq framework to achieve the goal of generating more abstractive summaries by learning to map input text to output text.

WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization

A method for direct crosslingual summarization without requiring translation at inference time is proposed by leveraging synthetic data and Neural Machine Translation as a pre-training step, which significantly outperforms the baseline approaches, while being more cost efficient during inference.

Get To The Point: Summarization with Pointer-Generator Networks

A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator.