Quantifying the Limits and Success of Extractive Summarization Systems Across Domains

Abstract

This paper analyzes the topic identification stage of single-document automatic text summarization across four different domains, consisting of newswire, literary, scientific and legal documents. We present a study that explores the summary space of each domain via an exhaustive search strategy, and finds the probability density function (pdf) of the ROUGE score distributions for each domain. We then use this pdf to calculate the percentile rank of extractive summarization systems. Our results introduce a new way to judge the success of automatic summarization systems and bring quantified explanations to questions such as why it was so hard for the systems to date to have a statistically significant improvement over the lead baseline in the news domain.

Extracted Key Phrases

8 Figures and Tables

Cite this paper

@inproceedings{Ceylan2010QuantifyingTL, title={Quantifying the Limits and Success of Extractive Summarization Systems Across Domains}, author={Hakan Ceylan and Rada Mihalcea and Umut O'zertem and Elena Lloret and Manuel Palomar}, booktitle={HLT-NAACL}, year={2010} }