• Corpus ID: 16092357

Multi-document summaries using n-gram graphs : salience and redundancy

  title={Multi-document summaries using n-gram graphs : salience and redundancy},
  author={George Giannakopoulos and George A. Vouros and Vangelis Karkaletsis},
This paper describes a summarization system that aims to provide a set of languageindependent and generic methods for generating extractive summaries. The proposed methods are realized as operators to a generic character n-gram graph representation of texts, towards the selection of content and removal of redundancy. This work defines the set of generic operators upon n-gram graphs and proposes a number of ways for using these operators within the summarization process. The experimental results… 
2 Citations

Figures and Tables from this paper

Better Metrics to Automatically Predict the Quality of a Text Summary
A family of metrics for estimating the quality of a text summary relative to one or more human-generated summaries, based on features automatically computed from the summaries to measure content and linguistic quality are demonstrated.


Summarization system evaluation revisited: N-gram graphs
A novel automatic method for the evaluation of summarization systems, based on comparing the character n-gram graphs representation of the extracted summaries and a number of model summaries, which appears to hold a level of evaluation performance that matches and even exceeds other contemporary evaluation methods.
Using N-Grams To Understand the Nature of Summaries
Empirically characterize human-written summaries provided in a widely used summarization corpus and suggest that extraction-based techniques which have been successful for single-document summarization may not be sufficient when summarizing multiple documents.
Multi-Document Summarization by Graph Search and Matching
A new method for summarizing similarities and differences in a pair of related documents using a graph representation for text using a spreading activation technique to discover nodes semantically related to the topic.
Testing the Use of N-gram Graphs in Summarization Sub-tasks
This study elaborate on query expansion, content matching and ltering, redundancy removal as well as summary evaluation, focusing on how the tools were used within the TAC 2008 summa- rization update challenge.
Sentence Fusion for Multidocument News Summarization
This article introduces sentence fusion, a novel text-to-text generation technique for synthesizing common information across documents that moves the summarization field from the use of purely extractive methods to the generation of abstracts that contain sentences not found in any of the input documents and can synthesize information across sources.
The use of MMR, diversity-based reranking for reordering documents and producing summaries
This paper presents a method for combining query-relevance with information-novelty in the context of text retrieval and summarization, and preliminary results indicate some benefits for MMR diversity ranking in document retrieval and in single document summarization.
Using Latent Semantic Analysis in Text Summarization and Summary Evaluation
This paper deals with using latent semantic analysis in text summarization. We describe a generic text summarization method which uses the latent semantic analysis technique to identify semantically
Centroid-based summarization of multiple documents: sentence extraction utility-based evaluation, and user studies
A multi-document summarizer, called MEAD, is presented, which generates summaries using cluster centroids produced by a topic detection and tracking system and two new techniques, based on sentence utility and subsumption, are described.
A system for query-specific document summarization
A method to create query-specific summaries by identifying the most query-relevant fragments and combining them using the semantic associations within the document by calculating the top spanning trees on the document graphs is presented.
Overview of the TAC 2008 Update Summarization Task
While all of the 71 submitted runs were automatically scored with the ROUGE and BE metrics, NIST assessors manually evaluated only 57 of the submitted runs for readability, content, and overall responsiveness.