Data-driven Summarization of Scientific Articles

@article{Nikolov2018DatadrivenSO,
  title={Data-driven Summarization of Scientific Articles},
  author={Nikola I. Nikolov and Michael Pfeiffer and Richard H. R. Hahnloser},
  journal={ArXiv},
  year={2018},
  volume={abs/1804.08875}
}
Data-driven approaches to sequence-to-sequence modelling have been successfully applied to short text summarization of news articles. [...] Key Method We generate two novel multi-sentence summarization datasets from scientific articles and test the suitability of a wide range of existing extractive and abstractive neural network-based summarization approaches. Our analysis demonstrates that scientific papers are suitable for data-driven text summarization. Our results could serve as valuable benchmarks for…Expand
Abstractive Document Summarization without Parallel Data
TLDR
This work develops an abstractive summarization system that relies only on large collections of example summaries and non-matching articles, consisting of an unsupervised sentence extractor that selects salient sentences to include in the final summary, as well as a sentence abstractor that is trained on pseudo-parallel and synthetic data.
Two Huge Title and Keyword Generation Corpora of Research Articles
TLDR
Two huge datasets for text summarization and keyword generation research, containing 34 million and 23 million records, respectively are introduced, and topic modeling is applied on the two sets to derive subsets of research articles from more specific disciplines.
Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization
TLDR
This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations and then utilizing amulti-view decoder to incorporate differentViews to generate dialogue summaries.
VNDS: A Vietnamese Dataset for Summarization
TLDR
This paper creates a standard dataset for document summarization and is the first to formally publish the large benchmark dataset of summarization, and makes a comparison of traditional and state-of-the-art extractive and abstractive summarization on it.
TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks
TLDR
This paper proposes a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences, and hypothesizes that such talks constitute a coherent and concise description of the papers’ content, and can form the basis for good summaries.
A Novel Wikipedia based Dataset for Monolingual and Cross-Lingual Summarization
Cross-lingual summarization is a challenging task for which there are no cross-lingual scientific resources currently available. To overcome the lack of a high-quality resource, we present a new
Text Summarization System: An Extractive Approach using Hierarchical Text Clustering
TLDR
An unsupervised text mining model is developed for clustering and summarizing texts and is deployed into a web-based system for summarizing large documents.
Cooperative Generator-Discriminator Networks for Abstractive Summarization with Narrative Flow
TLDR
To promote research toward abstractive summarization with narrative flow, a new dataset is introduced, Scientific Abstract SummarieS (SASS), where the abstracts are used as proxy gold summaries for scientific articles and Co-opNet is proposed, a novel transformer-based framework where the generator works with the discourse discriminator to compose a long-form summary.
Exploiting pivot words to classify and summarize discourse facets of scientific papers
TLDR
A new, more effective solution to the CL-SciSumm discourse facet classification task, which entails identifying for each cited text span what facet of the paper it belongs to from a predefined set of facets, and is to extract facet-specific descriptions of each RP consisting of a fixed-length collection of RP’s text spans.
...
1
2
3
...

References

SHOWING 1-10 OF 35 REFERENCES
A Supervised Approach to Extractive Summarisation of Scientific Papers
TLDR
This paper introduces a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and develops models on the dataset making use of both neural sentence encoding and traditionally used summarisation features.
Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond
TLDR
This work proposes several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-to-word structure, and emitting words that are rare or unseen at training time.
A Neural Attention Model for Abstractive Sentence Summarization
TLDR
This work proposes a fully data-driven approach to abstractive sentence summarization by utilizing a local attention-based model that generates each word of the summary conditioned on the input sentence.
Centroid-based Text Summarization through Compositionality of Word Embeddings
TLDR
This paper proposes a centroidbased method for text summarization that exploits the compositional capabilities of word embeddings and achieves good performance even in comparison to more complex deep learning models.
Towards Abstraction from Extraction: Multiple Timescale Gated Recurrent Unit for Summarization
TLDR
The proposed Multiple Timescale model of the Gated Recurrent Unit is implemented in the encoder-decoder setting to better deal with the presence of multiple compositionalities in larger texts.
Automatic Summarization
TLDR
The challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field are discussed.
Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status
TLDR
This article provides a gold standard for summaries of this kind consisting of a substantial corpus of conference articles in computational linguistics annotated with human judgments of the rhetorical status and relevance of each sentence in the articles.
COMPENDIUM: A text summarization system for generating abstracts of research papers
Sequence to Sequence Learning with Neural Networks
TLDR
This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
LexRank: Graph-based Lexical Centrality as Salience in Text Summarization
TLDR
A new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences is considered and the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank.
...
1
2
3
4
...