TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks

@inproceedings{Lev2019TalkSummAD,
  title={TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks},
  author={Guy Lev and Michal Shmueli-Scheuer and Jonathan Herzig and Achiya Jerbi and David Konopnicki},
  booktitle={ACL},
  year={2019}
}
Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers’ content, and can form the basis for good summaries. We collected 1716 papers and their corresponding videos, and created a dataset of paper… 
Semi-automatic Labelling of Scientific Articles using Deep Learning to Enlarge Benchmark Data for Scientific Summarization
TLDR
This research proposal intends to apply deep learning methods to increase a small seed of annotated corpus for scientific articles using semi-supervised/automatic annotation approaches, and measure the quality of the annotated Corpus on down stream informative summaries using various evaluation techniques.
Extractive Research Slide Generation Using Windowed Labeling Ranking
TLDR
A method to automatically generates slides for scientific articles based on a corpus of 5000 paper-slide pairs compiled from conference proceedings websites that outperforms several baseline methods including SummaRuNNer by a significant margin in terms of ROUGE score.
Unsupervised document summarization using pre-trained sentence embeddings and graph centrality
TLDR
A method for incorporating sentence embeddings produced by deep language models into extractive summarization techniques based on graph centrality in an unsupervised manner that can summarize any kind of document of any size and can satisfy any length constraints for the summaries produced.
LongSumm 2021: Session based automatic summarization model for scientific document
TLDR
This paper proposes a session based automatic summarization model (SBAS) which using a session and ensemble mechanism to generate long summary and achieves the best performance in the LongSumm task.
Scientific Document Summarization for LaySumm ’20 and LongSumm ’20
Automatic text summarization has been widely studied as an important task in natural language processing. Traditionally, various feature engineering and machine learning based systems have been
SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline
TLDR
SciSummPip is an unsupervised text summarization system for multi-document in news domain that includes a transformer-based language model SciBERT for contextual sentence representation, content selection with PageRank, sentence graph construction with both deep and linguistic information, and within-graph summary generation.
On Generating Extended Summaries of Long Documents
TLDR
This paper exploits hierarchical structure of the documents and incorporates it into an extractive summarization model through a multi-task learning approach and shows that the multi-tasking approach can adjust extraction probability distribution to the favor of summary-worthy sentences across diverse sections.
Monash-Summ@LongSumm 20 SciSummPip: An Unsupervised Scientific Paper Summarization Pipeline
TLDR
SciSummPip, an unsupervised text summarization system for multi-document in News domain inspired by SummP Pip, that includes a transformer-based language model SciBERT for contextual sentence representation and content selection and a summary length constraint is applied to adapt to the scientific domain.
GUIR @ LongSumm 2020: Learning to Generate Long Summaries from Scientific Documents
TLDR
A new multi-tasking approach on incorporating document structure into the summarizer outperforms the two other methods by large margins and ranks top according to Rouge-1 score while staying competitive in terms of Rouge-2.
Summaformers @ LaySumm 20, LongSumm 20
TLDR
This paper distinguishes between two types of summaries, namely, a very short summary that captures the essence of the research paper in layman terms and a much longer detailed summary aimed at providing specific insights into various ideas touched upon in the paper.
...
1
2
3
4
...

References

SHOWING 1-10 OF 20 REFERENCES
A Supervised Approach to Extractive Summarisation of Scientific Papers
TLDR
This paper introduces a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and develops models on the dataset making use of both neural sentence encoding and traditionally used summarisation features.
ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks
TLDR
The first large-scale manually-annotated corpus for scientific papers is developed and released by enabling faster annotation and summarization methods that integrate the authors’ original highlights and the article’s actual impacts on the community are proposed, to create comprehensive, hybrid summaries.
Neural Summarization by Extracting Sentences and Words
TLDR
This work develops a general framework for single-document summarization composed of a hierarchical document encoder and an attention-based extractor that allows for different classes of summarization models which can extract sentences or words.
Data-driven Summarization of Scientific Articles
TLDR
This work generates two novel multi-sentence summarization datasets from scientific articles and test the suitability of a wide range of existing extractive and abstractive neural network-based summarization approaches, demonstrating that scientific papers are suitable for data-driven text summarization.
Coherent Citation-Based Summarization of Scientific Papers
TLDR
This work presents an approach for producing readable and cohesive citation-based summaries and shows that the proposed approach outperforms several baselines in terms of both extraction quality and fluency.
Utilizing Microblogs for Automatic News Highlights Extraction
TLDR
A novel method to improve news highlights extraction by using microblogs based on the hypothesis that microblog posts, although noisy, are not only indicative of important pieces of information in the news story, but also inherently “short and sweet” resulting from the artificial compression effect due to the length limit.
The CL-SciSumm Shared Task 2017: Results and Key Insights
TLDR
This overview describes the official results of the CL-SciSumm Shared Task 2018 -- the first medium-scale shared task on scientific document summarization in the computational linguistics (CL) domain and compares the participating systems in terms of two evaluation metrics.
Scientific document summarization via citation contextualization and scientific discourse
TLDR
This work presents a framework for scientific summarization which takes advantage of the citations and the scientific discourse structure, and proposes three approaches for contextualizing citations which are based on query reformulation, word embeddings, and supervised learning.
Overview of the CL-SciSumm 2016 Shared Task
TLDR
This overview paper describes the participation and the official results of the second CL-SciSumm Shared Task, organized as a part of the Joint Workshop onBibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2016), held in New Jersey,USA in June, 2016.
Gibberish, Assistant, or Master?: Using Tweets Linking to News for Extractive Single-Document Summarization
TLDR
This paper reveals the very basic value of tweets that can be utilized by regarding every tweet as a vote for candidate sentences, and resorts to unsupervised summarization models by leveraging the linking tweets to master the ranking of candidate extracts via random walk on a heterogeneous graph.
...
1
2
...