TLDR: Extreme Summarization of Scientific Documents

@inproceedings{Cachola2020TLDRES,
  title={TLDR: Extreme Summarization of Scientific Documents},
  author={Isabel Cachola and Kyle Lo and Arman Cohan and Daniel S. Weld},
  booktitle={FINDINGS},
  year={2020}
}
We introduce TLDR generation, a new form of extreme summarization, for scientific papers. TLDR generation involves high source compression and requires expert background knowledge and understanding of complex domain-specific language. To facilitate study on this task, we introduce SCITLDR, a new multi-target dataset of 5.4K TLDRs over 3.2K papers. SCITLDR contains both author-written and expert-derived TLDRs, where the latter are collected using a novel annotation protocol that produces high… 

Figures and Tables from this paper

Using Pre-Trained Transformer for Better Lay Summarization
TLDR
This paper presents the approach of using Pre-training with Extracted Gap-sentences for Abstractive Summarization to produce the lay summary and combining those with the extractive summarization model using Bidirectional Encoder Representations from Transformers and readability metrics that measure the readability of the sentence to further improve the quality of the summary.
MSˆ2: Multi-Document Summarization of Medical Studies
TLDR
This work releases MSˆ2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20K summaries derived from the scientific literature that facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain.
ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis
TLDR
A novel ReviewRobot is built to automatically assign a review score and write comments for multiple categories such as novelty and meaningful comparison, and can serve as an assistant for paper reviewers, program chairs and authors.
Automated Lay Language Summarization of Biomedical Scientific Reviews
TLDR
This paper introduces the novel task of automated generation of lay language summaries of biomedical scientific reviews, and constructs a dataset to support the development and evaluation of automated methods through which to enhance the accessibility of the biomedical literature.
Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols
TLDR
This work introduces ScholarPhi, an augmented reading interface with four novel features: tooltips that surface position-sensitive definitions from elsewhere in a paper, a filter over the paper that “declutters” it to reveal how the term or symbol is used across the paper, automatic equation diagrams that expose multiple definitions in parallel, and an automatically generated glossary of important terms and symbols.
CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation
TLDR
A simple yet effective approach to automatically extracting TLDR summaries for scientific papers from their citation texts is proposed and a new benchmark CiteSum without human annotation is created, which is around 30 times larger than the previous human-curated dataset SciTLDR.
Automated scholarly paper review: Technologies and challenges
TLDR
This review paper proposes the concept and pipeline of automated scholarly paper review (ASPR) and review the relevant literature and technologies of achieving a full-scale computerized review process and concludes that there is already corresponding research and implementation at each stage of ASPR.
X-SCITLDR: cross-lingual extreme summarization of scholarly documents
TLDR
This paper presents a new X-SCITLDR dataset for multilingual summarization and thoroughly benchmark different models based on a state-of-the-art multilingual pre-trained model, including a two-stage 'summarize and translate' approach and a direct cross-lingual model.
Twist Decoding: Diverse Generators Guide Each Other
TLDR
This work introduces T WIST decoding, a simple and general inference algorithm that generates text while benefiting from diverse models, and hopes it will encourage researchers and practitioners to examine generation models collectively, not just indepen-dently, and to seek out models with complementary strengths to the currently available models.
GenCompareSum: a hybrid unsupervised summarization method using salience
TLDR
This work proposes a hybrid, unsupervised, abstractive-extractive approach to TS, in which the most important sentences of the document are selected by choosing the most similar sentences to the generated texts, calculated using BERTScore.
...
...

References

SHOWING 1-10 OF 63 REFERENCES
A Supervised Approach to Extractive Summarisation of Scientific Papers
TLDR
This paper introduces a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and develops models on the dataset making use of both neural sentence encoding and traditionally used summarisation features.
Extractive Summarization of Long Documents by Combining Global and Local Context
TLDR
A novel neural single-document extractive summarization model for long documents, incorporating both the global context of the whole document and the local context within the current topic, where it outperforms previous work, both extractive and abstractive models.
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
TLDR
This work proposes pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective, PEGASUS, and demonstrates it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores.
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
TLDR
This work proposes the first model for abstractive summarization of single, longer-form documents (e.g., research papers), consisting of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary.
Headline Generation: Learning from Decomposable Document Titles
TLDR
A novel method for generating titles for unstructured text documents is proposed and the results of a randomized double-blind trial in which subjects were unaware of which titles were human or machine-generated are presented.
Headline Generation: Learning from Decomposed Document Titles
TLDR
A novel method for generating titles for unstructured text documents is proposed and the results of a randomized double-blind trial in which subjects were unaware of which titles were human or machine-generated are presented.
TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks
TLDR
This paper proposes a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences, and hypothesizes that such talks constitute a coherent and concise description of the papers’ content, and can form the basis for good summaries.
Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
TLDR
A novel abstractive model is proposed which is conditioned on the article’s topics and based entirely on convolutional neural networks, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans.
Text Summarization with Pretrained Encoders
TLDR
This paper introduces a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences and proposes a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two.
Data-driven Summarization of Scientific Articles
TLDR
This work generates two novel multi-sentence summarization datasets from scientific articles and test the suitability of a wide range of existing extractive and abstractive neural network-based summarization approaches, demonstrating that scientific papers are suitable for data-driven text summarization.
...
...