• Corpus ID: 229679767

On Generating Extended Summaries of Long Documents

@article{Sotudeh2021OnGE,
  title={On Generating Extended Summaries of Long Documents},
  author={Sajad Sotudeh and Arman Cohan and Nazli Goharian},
  journal={ArXiv},
  year={2021},
  volume={abs/2012.14136}
}
Prior work in document summarization has mainly focused on generating short summaries of a document. While this type of summary helps get a high-level view of a given document, it is desirable in some cases to know more detailed information about its salient points that can't fit in a short summary. This is typically the case for longer documents such as a research paper, legal document, or a book. In this paper, we present a new method for generating extended summaries of long papers. Our… 

Figures and Tables from this paper

Incorporating domain knowledge for extractive summarization of legal case documents
TLDR
An unsupervised summarization algorithm DELSumm is proposed which is designed to systematically incorporate guidelines from legal experts into an optimization setup and outperforms several supervised summarization models that are trained over thousands of document-summary pairs.
TLDR9+: A Large Scale Resource for Extreme Summarization of Social Media Posts
TLDR
This paper introduces TLDR9+ –a large-scale summarization dataset– containing over 9 million training instances extracted from Reddit discussion forum ([HTTP]).
Recursively Summarizing Books with Human Feedback
TLDR
This method combines learning from human feedback with recursive task decomposition: it uses models trained on smaller parts of the task to assist humans in giving feedback on the broader task, and generates sensible summaries of entire books.

References

SHOWING 1-10 OF 30 REFERENCES
Extractive Summarization of Long Documents by Combining Global and Local Context
TLDR
A novel neural single-document extractive summarization model for long documents, incorporating both the global context of the whole document and the local context within the current topic, where it outperforms previous work, both extractive and abstractive models.
Summaformers @ LaySumm 20, LongSumm 20
TLDR
This paper distinguishes between two types of summaries, namely, a very short summary that captures the essence of the research paper in layman terms and a much longer detailed summary aimed at providing specific insights into various ideas touched upon in the paper.
GUIR @ LongSumm 2020: Learning to Generate Long Summaries from Scientific Documents
TLDR
A new multi-tasking approach on incorporating document structure into the summarizer outperforms the two other methods by large margins and ranks top according to Rouge-1 score while staying competitive in terms of Rouge-2.
A Supervised Approach to Extractive Summarisation of Scientific Papers
TLDR
This paper introduces a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and develops models on the dataset making use of both neural sentence encoding and traditionally used summarisation features.
Section mixture models for scientific document summarization
In this paper, we present a system for summarization of scientific and structured documents that has three components: section mixture models are used for estimation of the weights of terms; a
A Divide-and-Conquer Approach to the Summarization of Academic Articles
TLDR
A novel divide-and-conquer method for the summarization of long documents that processes the input in parts and generates a corresponding summary that leads to state-of-the-art results in two publicly available datasets of academic articles.
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
TLDR
This work proposes the first model for abstractive summarization of single, longer-form documents (e.g., research papers), consisting of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary.
Text Summarization with Pretrained Encoders
TLDR
This paper introduces a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences and proposes a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two.
Coherent Citation-Based Summarization of Scientific Papers
TLDR
This work presents an approach for producing readable and cohesive citation-based summaries and shows that the proposed approach outperforms several baselines in terms of both extraction quality and fluency.
Discourse-Aware Neural Extractive Text Summarization
TLDR
DiscoBert extracts sub-sentential discourse units (instead of sentences) as candidates for extractive selection on a finer granularity and outperforms state-of-the-art methods by a significant margin on popular summarization benchmarks compared to other BERT-base models.
...
1
2
3
...