Sentence Centrality Revisited for Unsupervised Summarization

  title={Sentence Centrality Revisited for Unsupervised Summarization},
  author={Hao Zheng and Mirella Lapata},
Single document summarization has enjoyed renewed interest in recent years thanks to the popularity of neural network models and the availability of large-scale datasets. In this paper we develop an unsupervised approach arguing that it is unrealistic to expect large-scale and high-quality training data to be available or created for different types of summaries, domains, or languages. We revisit a popular graph-based ranking algorithm and modify how node (aka sentence) centrality is computed… 

Figures and Tables from this paper

Improving Unsupervised Extractive Summarization with Facet-Aware Modeling

Experimental results show that the novel facet-aware centrality-based ranking model consistently outperforms strong baselines especially in longand multi-document scenarios and even performs comparably to some supervised models.

HipoRank: Incorporating Hierarchical and Positional Information into Graph-based Unsupervised Long Document Extractive Summarization

This work proposes a novel graph-based ranking model for unsupervised extractive summarization of long documents that leverages positional and hierarchical information grounded in discourse structure to augment a document's graph representation with hierarchy and directionality.

Centrality Meets Centroid: A Graph-based Approach for Unsupervised Document Summarization

This paper proposes a graph-based unsupervised approach for extractive document summarization that works at a summary-level by utilizing graph centrality and centroid.

Discourse-Aware Unsupervised Summarization for Long Scientific Documents

This work proposes an unsupervised graph-based ranking model for extractive summarization of long scientific documents, and suggests that patterns in the discourse structure are a strong signal for determining importance in scientific articles.

SUPERT: Towards New Frontiers in Unsupervised Evaluation Metrics for Multi-Document Summarization

This work proposes SUPERT, which rates the quality of a summary by measuring its semantic similarity with a pseudo reference summary, i.e. selected salient sentences from the source documents, using contextualized embeddings and soft token alignment techniques.

Tweet-aware News Summarization with Dual-Attention Mechanism

This paper focuses on unsupervised summarization problem by exploring news and readers’ comments in linking tweets, i.e., tweets with URLs linking to the news, and proposes position-dependent word salience, which reflects the effect of local context.


  • Computer Science
  • 2019
T TED, a transformer-based unsupervised summarization system with pretraining on largescale data, is proposed, leveraging the lead bias in news articles to pretrain the model on large-scale corpora and finetune TED on target domains through theme modeling and a denoising autoencoder to enhance the quality of summaries.

Scientific Paper Extractive Summarization Enhanced by Citation Graphs

This work focuses on leveraging citation graphs to improve paper extractive summarization under different settings and proposes a Graph-based Supervised Summarization model (GSS), which introduces a gated sentence encoder and a graph information fusion module to take advantage of the graph information to polish the sentence representation.

Unsupervised Summarization with Customized Granularities

This paper proposes the first unsupervised multi-granularity summarization framework, GranuSum, which takes events as the basic semantic units of the source documents and proposes to rank these events by their salience, and develops a model to summarize input documents with given events as anchors and hints.

SAPGraph: Structure-aware Extractive Summarization for Scientific Papers with Heterogeneous Graph

SAPGraph is a scientific paper extractive summarization framework based on a structure-aware heterogeneous graph, which models the document into a graph with three kinds of nodes and edges based on structure information of facets and knowledge.



LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

A new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences is considered and the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank.

Unsupervised Neural Multi-document Abstractive Summarization

The proposed end-to-end, neural model architecture to perform unsupervised abstractive summarization is applied to the summarization of business and product reviews and it is shown that the generated summaries are fluent, show relevancy in terms of word-overlap, representative of the average sentiment of the input documents, and are highly abstractive compared to baselines.

An Unsupervised Multi-Document Summarization Framework Based on Neural Document Model

A document-level reconstruction framework named DocRebuild is proposed, which reconstructs the documents with summary sentences through a neural document model and selects summary sentences to minimize the reconstruction error.

An Exploration of Document Impact on Graph-Based Multi-Document Summarization

A document-based graph model is proposed to incorporate the document-level information and the sentence-to-document relationship into the graph-based ranking process and the results show the robustness of the proposed model.

Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

A novel abstractive model is proposed which is conditioned on the article’s topics and based entirely on convolutional neural networks, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans.

Multi-document summarization using cluster-based link analysis

Experimental results on the DUC2001 and DUC2002 datasets demonstrate the good effectiveness of the proposed summarization models and demonstrate that the ClusterCMRW model is more robust than the ClusterHITS model, with respect to different cluster numbers.

Automatic Text Summarization of Newswire: Lessons Learned from the Document Understanding Conference

An overview of the achieved results in the different types of summarization tasks, comparing both the broader classes of baselines, systems and humans, as well as individual pairs of summarizers (both human and automatic).

Topical Coherence for Graph-based Extractive Summarization

We present an approach for extractive single-document summarization. Our approach is based on a weighted graphical representation of documents obtained by topic modeling. We optimize importance,

Optimizing Sentence Modeling and Selection for Document Summarization

This paper attempts to build a strong summarizer DivSelect+CNNLM by presenting new algorithms to optimize each of them, and proposes CNNLM, a novel neural network language model (NNLM) based on convolutional neural network (CNN), to project sentences into dense distributed representations, then models sentence redundancy by cosine similarity.

Automatic Summarization

The challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field are discussed.