Extractive Summarization of Long Documents by Combining Global and Local Context

@article{Xiao2019ExtractiveSO,
  title={Extractive Summarization of Long Documents by Combining Global and Local Context},
  author={Wen Xiao and Giuseppe Carenini},
  journal={ArXiv},
  year={2019},
  volume={abs/1909.08089}
}
In this paper, we propose a novel neural single document extractive summarization model for long documents, incorporating both the global context of the whole document and the local context within the current topic. [...] Key Result Rather surprisingly, an ablation study indicates that the benefits of our model seem to come exclusively from modeling the local context, even for the longest documents.Expand
Leveraging Information Bottleneck for Scientific Document Summarization
TLDR
This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle with two separate steps that can be flexibly extended to a multi-view framework by different signals.
From Standard Summarization to New Tasks and Beyond: Summarization with Manifold Information
TLDR
This paper focuses on the survey of these new summarization tasks and approaches in the real-world application of text summarization algorithms.
Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm
TLDR
This work addresses the problem of unsupervised extractive document summarization, especially for long documents, using a dedicated Frank-Wolfe algorithm and achieves better results with both datasets and works especially well when combined with embeddings for highly paraphrased summaries.
Improving Unsupervised Extractive Summarization with Facet-Aware Modeling
TLDR
Experimental results show that the novel facet-aware centrality-based ranking model consistently outperforms strong baselines especially in longand multi-document scenarios and even performs comparably to some supervised models.
Sliding Selector Network with Dynamic Memory for Extractive Summarization of Long Documents
TLDR
This work proposes the sliding selector network with dynamic memory for extractive summarization of long-form documents, which employs a sliding window to extract summary sentences segment by segment and adopts memory mechanism to preserve and update the history information dynamically, allowing the semantic flow across different windows.
Joint abstractive and extractive method for long financial document summarization
In this paper we show the results of our participation in the FNS 2021 shared task. In our work we propose an end to end financial narrative summarization system that first selects salient sentences
On Generating Extended Summaries of Long Documents
TLDR
This paper exploits hierarchical structure of the documents and incorporates it into an extractive summarization model through a multi-task learning approach and shows that the multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences across diverse sections.
Enhancing Extractive Text Summarization with Topic-Aware Graph Neural Networks
TLDR
This paper proposes a graph neural network (GNN)-based extractive summarization model, enabling to capture inter-sentence relationships efficiently via graph-structured document representation, and integrates a joint neural topic model (NTM) to discover latent topics, which can provide document-level features for sentence selection.
Systematically Exploring Redundancy Reduction in Summarizing Long Documents
TLDR
This work systematically explore and compare different ways to deal with redundancy when summarizing long documents, and proposes three additional methods balancing non-redundancy and importance in a general and flexible way.
SUMDocS: Surrounding-aware Unsupervised Multi-Document Summarization
TLDR
A novel method, SUMDocS (Surrounding-aware Unsupervised Multi-Document Summarization), which incorporates rich surrounding (topically related) documents to help improve the quality of extractive summarization without human supervision is proposed.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 54 REFERENCES
A Supervised Approach to Extractive Summarisation of Scientific Papers
TLDR
This paper introduces a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and develops models on the dataset making use of both neural sentence encoding and traditionally used summarisation features.
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
TLDR
This work proposes the first model for abstractive summarization of single, longer-form documents (e.g., research papers), consisting of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary.
Neural Summarization by Extracting Sentences and Words
TLDR
This work develops a general framework for single-document summarization composed of a hierarchical document encoder and an attention-based extractor that allows for different classes of summarization models which can extract sentences or words.
SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents
We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
TLDR
This work proposes several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time.
Content Selection in Deep Learning Models of Summarization
TLDR
It is suggested that it is easier to create a summarizer for a new domain than previous work suggests and the benefit of deep learning models for summarization for those domains that do have massive datasets is brought into question.
BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization
TLDR
This work presents a novel dataset, BIGPATENT, consisting of 1.3 million records of U.S. patent documents along with human written abstractive summaries, which has the following properties: i) summaries contain a richer discourse structure with more recurring entities, ii) salient content is evenly distributed in the input, and iii) lesser and shorter extractive fragments are present in the summaries.
Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion
Improving Neural Abstractive Document Summarization with Explicit Information Selection Modeling
TLDR
Experimental results demonstrate that the explicit modeling and optimizing of the information selection process improves document summarization performance significantly, which enables the model to generate more informative and concise summaries, and thus significantly outperform state-of-the-art neural abstractive methods.
Combining Graph Degeneracy and Submodularity for Unsupervised Extractive Summarization
TLDR
A fully unsupervised, extractive text summarization system that leverages a submodularity framework that allows summaries to be generated in a greedy way while preserving near-optimal performance guarantees is presented.
...
1
2
3
4
5
...