TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks

  title={TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks},
  author={Guy Lev and Michal Shmueli-Scheuer and Jonathan Herzig and Achiya Jerbi and David Konopnicki},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers’ content, and can form the basis for good summaries. We collected 1716 papers and their corresponding videos, and created a dataset of paper… 

Tables from this paper

SciBERTSUM: Extractive Summarization for Scientific Documents

SciBERTSUM, the authors' summarization framework designed for the summarization of long documents like scientific papers with more than 500 sentences, is introduced and the results show the superiority of the model in terms of ROUGE scores.

Community-Driven Comprehensive Scientific Paper Summarization: Insight from cvpaper.challenge

The present paper introduces a group activity involving writing summaries of conference proceedings by volunteer participants. The rapid increase in scientific papers is a heavy burden for

Semi-automatic Labelling of Scientific Articles using Deep Learning to Enlarge Benchmark Data for Scientific Summarization

This research proposal intends to apply deep learning methods to increase a small seed of annotated corpus for scientific articles using semi-supervised/automatic annotation approaches, and measure the quality of the annotated Corpus on down stream informative summaries using various evaluation techniques.

Extractive Research Slide Generation Using Windowed Labeling Ranking

A method to automatically generates slides for scientific articles based on a corpus of 5000 paper-slide pairs compiled from conference proceedings websites that outperforms several baseline methods including SummaRuNNer by a significant margin in terms of ROUGE score.

Unsupervised document summarization using pre-trained sentence embeddings and graph centrality

A method for incorporating sentence embeddings produced by deep language models into extractive summarization techniques based on graph centrality in an unsupervised manner that can summarize any kind of document of any size and can satisfy any length constraints for the summaries produced.

Generating Diverse Extended Summaries of Scientific Articles

This paper proposes to develop an attention-based BiLSTM-CNN framework for the purpose of generating extended summaries of scientific articles and benchmarks the model against baseline techniques and results reveal that the proposed attention- based deep neural network outperforms other models with a significant margin.

LongSumm 2021: Session based automatic summarization model for scientific document

This paper proposes a session based automatic summarization model (SBAS) which using a session and ensemble mechanism to generate long summary and achieves the best performance in the LongSumm task.

TLDR: Extreme Summarization of Scientific Documents

This work introduces SCITLDR, a new multi-target dataset of 5.4K TLDRs over 3.2K papers, and proposes CATTS, a simple yet effective learning strategy for generatingTLDRs that exploits titles as an auxiliary training signal.

CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation

A simple yet effective approach to automatically extracting TLDR summaries for scientific papers from their citation texts is proposed and a new benchmark CiteSum without human annotation is created, which is around 30 times larger than the previous human-curated dataset SciTLDR.

Scientific Document Summarization for LaySumm ’20 and LongSumm ’20

This paper distinguishes between two types of summaries, namely, a very short summary that captures the essence of the research paper in layman terms restricting overtly specific technical jargon and a much longer detailed summary aimed at providing specific insights into various ideas touched upon in the paper.



A Supervised Approach to Extractive Summarisation of Scientific Papers

This paper introduces a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and develops models on the dataset making use of both neural sentence encoding and traditionally used summarisation features.

ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks

The first large-scale manually-annotated corpus for scientific papers is developed and released by enabling faster annotation and summarization methods that integrate the authors’ original highlights and the article’s actual impacts on the community are proposed, to create comprehensive, hybrid summaries.

Neural Summarization by Extracting Sentences and Words

This work develops a general framework for single-document summarization composed of a hierarchical document encoder and an attention-based extractor that allows for different classes of summarization models which can extract sentences or words.

Data-driven Summarization of Scientific Articles

This work generates two novel multi-sentence summarization datasets from scientific articles and test the suitability of a wide range of existing extractive and abstractive neural network-based summarization approaches, demonstrating that scientific papers are suitable for data-driven text summarization.

Coherent Citation-Based Summarization of Scientific Papers

This work presents an approach for producing readable and cohesive citation-based summaries and shows that the proposed approach outperforms several baselines in terms of both extraction quality and fluency.

Utilizing Microblogs for Automatic News Highlights Extraction

A novel method to improve news highlights extraction by using microblogs based on the hypothesis that microblog posts, although noisy, are not only indicative of important pieces of information in the news story, but also inherently “short and sweet” resulting from the artificial compression effect due to the length limit.

The CL-SciSumm Shared Task 2017: Results and Key Insights

This overview describes the official results of the CL-SciSumm Shared Task 2018 -- the first medium-scale shared task on scientific document summarization in the computational linguistics (CL) domain and compares the participating systems in terms of two evaluation metrics.

Gibberish, Assistant, or Master?: Using Tweets Linking to News for Extractive Single-Document Summarization

This paper reveals the very basic value of tweets that can be utilized by regarding every tweet as a vote for candidate sentences, and resorts to unsupervised summarization models by leveraging the linking tweets to master the ranking of candidate extracts via random walk on a heterogeneous graph.

Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting

An accurate and fast summarization model that first selects salient sentences and then rewrites them abstractively to generate a concise overall summary is proposed, which achieves the new state-of-the-art on all metrics on the CNN/Daily Mail dataset, as well as significantly higher abstractiveness scores.

Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books

To align movies and books, a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book are proposed.