TLDR: Extreme Summarization of Scientific Documents
@inproceedings{Cachola2020TLDRES, title={TLDR: Extreme Summarization of Scientific Documents}, author={Isabel Cachola and Kyle Lo and Arman Cohan and Daniel S. Weld}, booktitle={Findings}, year={2020} }
We introduce TLDR generation, a new form of extreme summarization, for scientific papers. TLDR generation involves high source compression and requires expert background knowledge and understanding of complex domain-specific language. To facilitate study on this task, we introduce SCITLDR, a new multi-target dataset of 5.4K TLDRs over 3.2K papers. SCITLDR contains both author-written and expert-derived TLDRs, where the latter are collected using a novel annotation protocol that produces high…
Figures and Tables from this paper
94 Citations
Using Pre-Trained Transformer for Better Lay Summarization
- Computer ScienceSDP
- 2020
This paper presents the approach of using Pre-training with Extracted Gap-sentences for Abstractive Summarization to produce the lay summary and combining those with the extractive summarization model using Bidirectional Encoder Representations from Transformers and readability metrics that measure the readability of the sentence to further improve the quality of the summary.
MSˆ2: Multi-Document Summarization of Medical Studies
- Computer ScienceEMNLP
- 2021
This work releases MSˆ2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20K summaries derived from the scientific literature that facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain.
ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis
- Computer ScienceINLG
- 2020
A novel ReviewRobot is built to automatically assign a review score and write comments for multiple categories such as novelty and meaningful comparison, and can serve as an assistant for paper reviewers, program chairs and authors.
Automated Lay Language Summarization of Biomedical Scientific Reviews
- Computer ScienceAAAI
- 2021
Analysis of the various challenges in performing the automated generation of lay language summaries of biomedical scientific reviews indicate that automatically generated summaries produced using contemporary neural architectures can achieve promising quality and readability as compared with references developed for the lay public by experts.
Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols
- Computer ScienceCHI
- 2021
This work introduces ScholarPhi, an augmented reading interface with four novel features: tooltips that surface position-sensitive definitions from elsewhere in a paper, a filter over the paper that “declutters” it to reveal how the term or symbol is used across the paper, automatic equation diagrams that expose multiple definitions in parallel, and an automatically generated glossary of important terms and symbols.
CiteSum: Citation Text-guided Scientific Extreme Summarization and Domain Adaptation with Limited Supervision
- Computer ScienceEMNLP
- 2022
A simple yet effective approach to automatically extracting TLDR summaries for scientific papers from their citation texts is proposed, and a new benchmark CiteSum without human annotation is created, which is around 30 times larger than the previous human-curated dataset SciTLDR.
CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation
- Computer ScienceArXiv
- 2022
A simple yet effective approach to automatically extracting TLDR summaries for scientific papers from their citation texts is proposed and a new benchmark CiteSum without human annotation is created, which is around 30 times larger than the previous human-curated dataset SciTLDR.
Automated scholarly paper review: Technologies and challenges
- Computer Science
- 2021
This review paper proposes the concept and pipeline of automated scholarly paper review (ASPR) and review the relevant literature and technologies of achieving a full-scale computerized review process and concludes that there is already corresponding research and implementation at each stage of ASPR.
The Semantic Scholar Open Data Platform
- Computer ScienceArXiv
- 2023
This paper combines public and proprietary data sources using state-of-theart techniques for scholarly PDF content extraction and automatic knowledge graph construction to build the Semantic Scholar Academic Graph, the largest open scientific literature graph to-date.
EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain
- Computer ScienceEMNLP
- 2022
This work proposes a novel dataset, called EUR-Lex-Sum, based on manually curated document summaries of legal acts from the European Union law platform (EUR-Lex), and key characteristics of the resource are compared to existing summarization resources.
References
SHOWING 1-10 OF 59 REFERENCES
A Supervised Approach to Extractive Summarisation of Scientific Papers
- Computer ScienceCoNLL
- 2017
This paper introduces a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and develops models on the dataset making use of both neural sentence encoding and traditionally used summarisation features.
Extractive Summarization of Long Documents by Combining Global and Local Context
- Computer ScienceEMNLP
- 2019
A novel neural single-document extractive summarization model for long documents, incorporating both the global context of the whole document and the local context within the current topic, where it outperforms previous work, both extractive and abstractive models.
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
- Computer ScienceICML
- 2020
This work proposes pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective, PEGASUS, and demonstrates it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores.
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
- Computer ScienceNAACL
- 2018
This work proposes the first model for abstractive summarization of single, longer-form documents (e.g., research papers), consisting of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary.
Headline Generation: Learning from Decomposable Document Titles
- Computer Science
- 2019
A novel method for generating titles for unstructured text documents is proposed and the results of a randomized double-blind trial in which subjects were unaware of which titles were human or machine-generated are presented.
Headline Generation: Learning from Decomposed Document Titles
- Computer ScienceArXiv
- 2019
A novel method for generating titles for unstructured text documents is proposed and the results of a randomized double-blind trial in which subjects were unaware of which titles were human or machine-generated are presented.
TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks
- Computer ScienceACL
- 2019
This paper proposes a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences, and hypothesizes that such talks constitute a coherent and concise description of the papers’ content, and can form the basis for good summaries.
Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
- Computer ScienceEMNLP
- 2018
A novel abstractive model is proposed which is conditioned on the article’s topics and based entirely on convolutional neural networks, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans.
Text Summarization with Pretrained Encoders
- Computer ScienceEMNLP
- 2019
This paper introduces a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences and proposes a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two.
Data-driven Summarization of Scientific Articles
- Computer ScienceArXiv
- 2018
This work generates two novel multi-sentence summarization datasets from scientific articles and test the suitability of a wide range of existing extractive and abstractive neural network-based summarization approaches, demonstrating that scientific papers are suitable for data-driven text summarization.