Text Summarization with Pretrained Encoders

@article{Liu2019TextSW,
  title={Text Summarization with Pretrained Encoders},
  author={Yang Liu and Mirella Lapata},
  journal={ArXiv},
  year={2019},
  volume={abs/1908.08345}
}
Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. [...] Key Method Our extractive model is built on top of this encoder by stacking several inter-sentence Transformer layers.Expand
Two-stage encoding Extractive Summarization
TLDR
A two-stage encoder model (TSEM) for extractive summarization that proposes a new strategy to fine-tune BERT deriving meaningful document embedding, then selects the best-matched combination of important sentences with source document to compose summarization.
Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
TLDR
A new method of BERT-windowing, which allows chunk-wise processing of texts longer than the BERT window size and how locality modeling, i.e., the explicit restriction of calculations to the local context, can affect the summarization ability of the Transformer.
Domain Adaptation with Pre-trained Transformers for Query Focused Abstractive Text Summarization
TLDR
This paper applies a variety of techniques using pre-trained transformer-based summarization models including transfer learning, weakly supervised learning, and distant supervision to generate abstractive summaries for the Query Focused Text Summarization task.
Hybrid Extractive/Abstractive Summarization Using Pre-Trained Sequence-to-Sequence Models
Typical document summarization methods can be either extractive, by selecting appropriate parts of the input text to include in the summary, or abstractive, by generating new text with basis on a
Transductive Learning for Abstractive News Summarization
TLDR
This work proposes the first application of transductive learning to summarization by utilizing input document summarizing sentences to construct references for learning in test time and shows that its summaries become more abstractive and coherent.
ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization
TLDR
The proposed ARMAN, a Transformer-based encoderdecoder model pre-trained with three novel objectives, achieves state-of-the-art performance on all six summarization tasks measured by ROUGE and BERTScore.
Topic-Aware Abstractive Text Summarization
TLDR
This study proposes a topic-aware abstractive summarization (TAAS) framework that seamlessly incorporates a neural topic modeling into an encoder-decoder based sequence generation procedure via attention for summarization and achieves comparable performance to PEGASUS and ProphetNet.
Discourse-Aware Neural Extractive Text Summarization
TLDR
DiscoBert extracts sub-sentential discourse units (instead of sentences) as candidates for extractive selection on a finer granularity and outperforms state-of-the-art methods by a significant margin on popular summarization benchmarks compared to other BERT-base models.
SegaBERT: Pre-training of Segment-aware BERT for Language Understanding
TLDR
A segment-aware BERT is proposed, by replacing the token position embedding of Transformer with a combination of paragraph index, sentence index, and token index embeddings, and Experimental results show that the pre-trained model can outperform the original BERT model on various NLP tasks.
Extractive Summarization as Text Matching
TLDR
This paper forms the extractive summarization task as a semantic text matching problem, in which a source document and candidate summaries will be matched in a semantic space to create a semantic matching framework.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 36 REFERENCES
HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization
TLDR
This work proposes Hibert (as shorthand for HIerachical Bidirectional Encoder Representations from Transformers) for document encoding and a method to pre-train it using unlabeled data and achieves the state-of-the-art performance on these two datasets.
A Deep Reinforced Model for Abstractive Summarization
TLDR
A neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL) that produces higher quality summaries.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Bottom-Up Abstractive Summarization
TLDR
This work explores the use of data-efficient content selectors to over-determine phrases in a source document that should be part of the summary, and shows that this approach improves the ability to compress text, while still generating fluent summaries.
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
TLDR
This work proposes several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time.
Pre-trained language model representations for language generation
TLDR
This paper examines different strategies to integrate pre-trained representations into sequence to sequence models and applies it to neural machine translation and abstractive summarization and finds that pre- trained representations are most effective when added to the encoder network which slows inference by only 14%.
Unified Language Model Pre-training for Natural Language Understanding and Generation
TLDR
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.
A Neural Attention Model for Abstractive Sentence Summarization
TLDR
This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence.
Get To The Point: Summarization with Pointer-Generator Networks
TLDR
A novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways, using a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator.
Attention is All you Need
TLDR
A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
...
1
2
3
4
...