Corpus ID: 209405420

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

@article{Zhang2020PEGASUSPW,
  title={PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization},
  author={Jingqing Zhang and Yao Zhao and Mohammad Saleh and Peter J. Liu},
  journal={ArXiv},
  year={2020},
  volume={abs/1912.08777}
}
Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised… Expand
AgreeSum: Agreement-Oriented Multi-Document Summarization
TLDR
This work creates a dataset for AgreeSum, and provides annotations on article-summary entailment relations for a subset of the clusters in the dataset, and hopes that these article- summary entailment annotations contribute to the community’s effort in improving abstractive summarization faithfulness. Expand
Human-Centered Financial Summarization
Automatic summarization has seen a rapid growth in recent years, allowing for efficient handling and processing of the huge amount of documents, which are available on the Web. There is also anExpand
Meta-Transfer Learning for Low-Resource Abstractive Summarization
TLDR
The results demonstrate that the proposed approach achieves the state-of-the-art on 6 corpora in low-resource scenarios, with only 0.7% of trainable parameters compared to previous work. Expand
NCUEE-NLP at MEDIQA 2021: Health Question Summarization Using PEGASUS Transformers
TLDR
The model design of the NCUEE-NLP system for the MEDIQA challenge at the BioNLP 2021 workshop is described and the PEGASUS transformers are used to fine-tune the downstream summarization task using the collected and processed datasets. Expand
NLM at MEDIQA 2021: Transfer Learning-based Approaches for Consumer Question and Multi-Answer Summarization
TLDR
This work exploited the capabilities of pre-trained transformer models and introduced a transfer learning approach for the abstractive Question Summarization and extractive Multi-Answer Summarized tasks by first pre-training the model on a task-specific summarization dataset followed by fine-tuning it for both the tasks via incorporating medical entities. Expand
Want To Reduce Labeling Cost? GPT-3 Can Help
Data annotation is a time-consuming and labor-intensive process for many NLP tasks. Although there exist various methods to produce pseudo data labels, they are often taskspecific and require aExpand
AQuaMuSe: Automatically Generating Datasets for Query-Based Multi-Document Summarization
TLDR
This work proposes a scalable approach called AQuaMuSe to automatically mine qMDS examples from question answering datasets and large document corpora, and can general a dual dataset -- for extractive and abstractive summaries both. Expand
Few-Shot Text Generation with Pattern-Exploiting Training
TLDR
This paper adapts Pattern-Exploiting Training (PET), a recently proposed few-shot approach, for finetuning generative language models on text generation tasks and shows that the underlying idea can also be applied to text classification tasks. Expand
Improving Text Generation Evaluation with Batch Centering and Tempered Word Mover Distance
TLDR
Two techniques for improving encoding representations for similarity metrics are presented: a batch-mean centering strategy that improves statistical properties; and a computationally efficient tempered Word Mover Distance, for better fusion of the information in the contextualized word representations. Expand
A Condense-then-Select Strategy for Text Summarization
TLDR
This work proposes a novel condense-then-select framework for text summarization that helps to avoid the loss of salient information, while preserving the high efficiency of sentence-level compression. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 60 REFERENCES
BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization
TLDR
This work presents a novel dataset, BIGPATENT, consisting of 1.3 million records of U.S. patent documents along with human written abstractive summaries, which has the following properties: i) summaries contain a richer discourse structure with more recurring entities, ii) salient content is evenly distributed in the input, and iii) lesser and shorter extractive fragments are present in the summaries. Expand
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
TLDR
This work proposes the first model for abstractive summarization of single, longer-form documents (e.g., research papers), consisting of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary. Expand
Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
TLDR
A novel abstractive model is proposed which is conditioned on the article’s topics and based entirely on convolutional neural networks, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans. Expand
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data. Expand
Teaching Machines to Read and Comprehend
TLDR
A new methodology is defined that resolves this bottleneck and provides large scale supervised reading comprehension data that allows a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure to be developed. Expand
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
TLDR
BART is presented, a denoising autoencoder for pretraining sequence-to-sequence models, which matches the performance of RoBERTa on GLUE and SQuAD, and achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks. Expand
Language Models are Unsupervised Multitask Learners
TLDR
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations. Expand
Leveraging pretrained checkpoints for sequence generation
  • tasks. arXiv preprint arXiv:1907.12461,
  • 2019
MASS: Masked Sequence to Sequence Pre-training for Language Generation
TLDR
This work proposes MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks, which achieves the state-of-the-art accuracy on the unsupervised English-French translation, even beating the early attention-based supervised model. Expand
Sample Efficient Text Summarization Using a Single Pre-Trained Transformer
TLDR
This work uses a pre- trained decoder-only network, where the same Transformer LM both encodes the source and generates the summary, and substantially improves over pre-trained Transformer encoder-decoder networks in limited-data settings. Expand
...
1
2
3
4
5
...