TLDR: Extreme Summarization of Scientific Documents

@inproceedings{Cachola2020TLDRES,
  title={TLDR: Extreme Summarization of Scientific Documents},
  author={Isabel Cachola and Kyle Lo and Arman Cohan and Daniel S. Weld},
  booktitle={FINDINGS},
  year={2020}
}
We introduce TLDR generation, a new form of extreme summarization, for scientific papers. TLDR generation involves high source compression and requires expert background knowledge and understanding of complex domain-specific language. To facilitate study on this task, we introduce SCITLDR, a new multi-target dataset of 5.4K TLDRs over 3.2K papers. SCITLDR contains both author-written and expert-derived TLDRs, where the latter are collected using a novel annotation protocol that produces high… Expand

Figures and Tables from this paper

Using Pre-Trained Transformer for Better Lay Summarization
TLDR
This paper presents the approach of using Pre-training with Extracted Gap-sentences for Abstractive Summarization to produce the lay summary and combining those with the extractive summarization model using Bidirectional Encoder Representations from Transformers and readability metrics that measure the readability of the sentence to further improve the quality of the summary. Expand
MS2: Multi-Document Summarization of Medical Studies
TLDR
This work releases MS^2 (Multi-Document Summarization of Medical Studies), a dataset of over 470k documents and 20k summaries derived from the scientific literature that facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is the first large-scale, publicly available multi-document summarization dataset in the biomedical domain. Expand
ReviewRobot: Explainable Paper Review Generation based on Knowledge Synthesis
TLDR
A novel ReviewRobot is built to automatically assign a review score and write comments for multiple categories such as novelty and meaningful comparison, and can serve as an assistant for paper reviewers, program chairs and authors. Expand
Augmenting Scientific Papers with Just-in-Time, Position-Sensitive Definitions of Terms and Symbols
TLDR
This work introduces ScholarPhi, an augmented reading interface with four novel features: tooltips that surface position-sensitive definitions from elsewhere in a paper, a filter over the paper that “declutters” it to reveal how the term or symbol is used across the paper, automatic equation diagrams that expose multiple definitions in parallel, and an automatically generated glossary of important terms and symbols. Expand
Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents
TLDR
Faceted summarization will spur further advances in summarization research and foster the development of NLP systems that can leverage the structured information in both long texts and summaries, according to this study. Expand
D2S: Document-to-Slide Generation Via Query-Based Text Summarization
TLDR
D2S is presented, a novel system that tackles the document-to-slides task with a two-step approach that suggests that long-form QA outperforms state-of-the-art summarization baselines on both automated ROUGE metrics and qualitative human evaluation. Expand
Summary Grounded Conversation Generation
TLDR
This work investigates how language generation has improved immensely in recent years with the advancement of pretrained language models, and investigates how such models can be utilized to generate entire conversations, given only a summary of a conversation as the input. Expand
The Effect of Pretraining on Extractive Summarization for Scientific Documents
TLDR
This work derives significant performance improvements using an intermediate pretraining step that leverages existing summarization datasets and reports state-of-the-art results on a recently released scientific summarization dataset, SciTLDR. Expand
Breaking Down Walls of Text: How Can NLP Benefit Consumer Privacy?
TLDR
The goal is to provide a roadmap for the development and use of language technologies to empower users to reclaim control over their privacy, limit privacy harms, and rally research efforts from the community towards addressing an issue with large social impact. Expand
Generating Informative Conclusions for Argumentative Texts
TLDR
The task of generating informative conclusions is introduced, a large-scale corpus of 136,996 samples of argumentative texts and their conclusions are compiled, and two paradigms for conclusion generation are investigated; one extractive, the other abstractive in nature. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 62 REFERENCES
A Supervised Approach to Extractive Summarisation of Scientific Papers
TLDR
This paper introduces a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and develops models on the dataset making use of both neural sentence encoding and traditionally used summarisation features. Expand
Extractive Summarization of Long Documents by Combining Global and Local Context
TLDR
A novel neural single-document extractive summarization model for long documents, incorporating both the global context of the whole document and the local context within the current topic, where it outperforms previous work, both extractive and abstractive models. Expand
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents
TLDR
This work proposes the first model for abstractive summarization of single, longer-form documents (e.g., research papers), consisting of a new hierarchical encoder that models the discourse structure of a document, and an attentive discourse-aware decoder to generate the summary. Expand
Headline Generation: Learning from Decomposable Document Titles
TLDR
A novel method for generating titles for unstructured text documents is proposed and the results of a randomized double-blind trial in which subjects were unaware of which titles were human or machine-generated are presented. Expand
Headline Generation: Learning from Decomposed Document Titles
TLDR
A novel method for generating titles for unstructured text documents is proposed and the results of a randomized double-blind trial in which subjects were unaware of which titles were human or machine-generated are presented. Expand
TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks
TLDR
This paper proposes a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences, and hypothesizes that such talks constitute a coherent and concise description of the papers’ content, and can form the basis for good summaries. Expand
Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
TLDR
A novel abstractive model is proposed which is conditioned on the article’s topics and based entirely on convolutional neural networks, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans. Expand
Text Summarization with Pretrained Encoders
TLDR
This paper introduces a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences and proposes a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two. Expand
Data-driven Summarization of Scientific Articles
TLDR
This work generates two novel multi-sentence summarization datasets from scientific articles and test the suitability of a wide range of existing extractive and abstractive neural network-based summarization approaches, demonstrating that scientific papers are suitable for data-driven text summarization. Expand
BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization
TLDR
This work presents a novel dataset, BIGPATENT, consisting of 1.3 million records of U.S. patent documents along with human written abstractive summaries, which has the following properties: i) summaries contain a richer discourse structure with more recurring entities, ii) salient content is evenly distributed in the input, and iii) lesser and shorter extractive fragments are present in the summaries. Expand
...
1
2
3
4
5
...