Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization

  title={Revisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization},
  author={Demian Gholipour Ghalandari},
The centroid-based model for extractive document summarization is a simple and fast baseline that ranks sentences based on their similarity to a centroid vector. In this paper, we apply this ranking to possible summaries instead of sentences and use a simple greedy algorithm to find the best summary. Furthermore, we show possi- bilities to scale up to larger input docu- ment collections by selecting a small num- ber of sentences from each document prior to constructing the summary. Experiments… Expand
Unsupervised Aspect-Based Multi-Document Abstractive Summarization
This work addresses opinion summarization, a multi-document summarization task, with an unsupervised abstractive summarization neural system based on a language model meant to encode reviews to a vector space and to generate fluent sentences from the same vector space. Expand
An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings
An unsupervised method for generic extractive multi-document summarization based on the sentence embedding representations and the centroid approach that outperforms several state-of-the-art methods and achieves promising results compared to the best performing methods including supervised deep learning based methods. Expand
Examining the State-of-the-Art in News Timeline Summarization
This paper compares different TLS strategies using appropriate evaluation frameworks, and proposes a simple and effective combination of methods that improves over the stateof-the-art on all tested benchmarks. Expand
Myanmar news summarization using different word representations
Myanmar local and international news are summarized using centroid-based word embedding summarizer using the effectiveness of word representation approach,Word embedding, which performs comprehensively better than bag-of-words summarization. Expand
Self-Supervised and Controlled Multi-Document Opinion Summarization
This work proposes a self-supervised setup that considers an individual document as a target summary for a set of similar documents, which makes training simpler than previous approaches by relying only on standard log-likelihood loss and mainstream models. Expand
Summarize Dates First: A Paradigm Shift in Timeline Summarization
This paper presents a new approach, namely Summarize Date First, which focuses on first generating date-level summaries then selecting the most relevant dates on top of summarized knowledge, which is superior to state-of-the-art unsupervised methods and competitive against supervised ones. Expand
A Proposal: Interactively Learning to Summarise Timelines by Reinforcement Learning
Timeline Summarisation (TLS) aims to generate a concise, time-ordered list of events described in sources such as news articles. However, current systems do not provide an adequate way to adapt toExpand
Deep submodular network: An application to multi-document summarization
A deep sub modular network (DSN) is introduced, which is a deep network meeting submodularity characteristics that lets modular and submodular features to participate in constructing a tailored model that fits the best with a problem. Expand
Identifying Implicit Quotes for Unsupervised Extractive Summarization of Conversations
Two topics are discussed; one is whether quote extraction is an important factor for summarization, and the other is whether the model can capture salient sentences that conventional methods cannot. Expand
Developing and Orchestrating a Portfolio of Natural Legal Language Processing and Document Curation Services
This article presents a portfolio of natural legal language processing and document curation services currently under development in a collaborative European project that is being deployed in different prototype applications using a flexible and scalable microservices architecture. Expand


Exploring Content Models for Multi-Document Summarization
We present an exploration of generative probabilistic models for multi-document summarization. Beginning with a simple word frequency based model (Nenkova and Vanderwende, 2005), we construct aExpand
Centroid-based summarization of multiple documents
A multi-document summarizer, MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and an evaluation scheme based on sentence utility and subsumption is applied. Expand
Centroid-based Text Summarization through Compositionality of Word Embeddings
This paper proposes a centroidbased method for text summarization that exploits the compositional capabilities of word embeddings and achieves good performance even in comparison to more complex deep learning models. Expand
A Class of Submodular Functions for Document Summarization
A class of submodular functions meant for document summarization tasks which combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. Expand
Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization
We develop a Ranking framework upon Recursive Neural Networks (R2N2) to rank sentences for multi-document summarization. It formulates the sentence ranking task as a hierarchical regression process,Expand
A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization
A corpus of summaries produced by several state-of-the-art extractive summarization systems or by popular baseline systems is presented to facilitate future research on generic summarization and motivates the need for development of more sensitive evaluation measures and for approaches to system combination in summarization. Expand
ROUGE: A Package for Automatic Evaluation of Summaries
Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations. Expand
Improving the Estimation of Word Importance for News Multi-Document Summarization
A supervised model for ranking word importance that incorporates a rich set of features is proposed that is superior to prior approaches for identifying words used in human summaries and shows that an extractive summarizer which includes the estimation of word importance results in summaries comparable with the state-of-the-art by automatic evaluation. Expand
LexRank: Graph-based Lexical Centrality as Salience in Text Summarization
A new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences is considered and the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. Expand
Experiments in Newswire Summarisation
This paper investigates extractive multi-document summarisation algorithms over newswire corpora, validating that automatic summarisation evaluation is a useful proxy for manual evaluation, and verifying that several state-of-the-art systems with similar automatic evaluation scores create different summaries from one another. Expand