Evaluating Content Selection in Summarization: The Pyramid Method

@inproceedings{Nenkova2004EvaluatingCS,
  title={Evaluating Content Selection in Summarization: The Pyramid Method},
  author={A. Nenkova and R. Passonneau},
  booktitle={NAACL},
  year={2004}
}
We present an empirically grounded method for evaluating content selection in summarization. [...] Key Method Our method quantifies the relative importance of facts to be conveyed. We argue that it is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference.Expand
Automatically Evaluating Content Selection in Summarization without Human Models
TLDR
This work capitalizes on the assumption that the distribution of words in the input and an informative summary of that input should be similar to each other, and ranks participating systems similarly to manual model-based pyramid evaluation and to manual human judgments of responsiveness. Expand
Pyramid-based Summary Evaluation Using Abstract Meaning Representation
TLDR
The proposed metric complements well the widely-used ROUGE metrics and automatizes the evaluation process, which does not need any manual intervention on the evaluated summary side. Expand
Improving Content Selection for Update Summarization with Subtopic-Enriched Sentence Ranking Functions
TLDR
This work proposes the enriching of traditional approaches of summarization with the using of subtopic representation, which are coherent textual segments with one or more sentences in a row that improve the quality of produced summary and show high recall values. Expand
Automated Pyramid Summarization Evaluation
TLDR
An automated method is presented that is more efficient, more transparent, and more complete than previous automated pyramid methods and is tested on a new dataset of student summaries, and historical NIST data from extractive summarizers. Expand
Satisfying information needs with multi-document summaries
TLDR
This novel framework for summarization has the advantage of producing highly responsive summaries, as indicated by the evaluation results. Expand
The Pyramid Method: Incorporating human content selection variation in summarization evaluation
TLDR
This article proposes a method for analysis of multiple human abstracts into semantic content units, which serves as the basis for an evaluation method that incorporates the observed variation and is predictive of different equally informative summaries. Expand
Bayesian Summarization at DUC and a Suggestion for Extrinsic Evaluation
We describe our entry into the Document Understanding Conference competition for evaluating query-focused multidocument summarization systems. Our system is based on a Bayesian QueryFocusedExpand
Multilingual Summarization Evaluation without Human Models
TLDR
This work applies a new content-based evaluation framework called Fresa to compute a variety of divergences among probability distributions in text summarization tasks including generic and focus-based multi-document summarization in English and generic single-document summary in French and Spanish. Expand
Merging Multiple Features to Evaluate the Content of Text Summary
TLDR
This method operates by combining multiple features to build models that predict the PYRAMID scores for new summaries and has achieved good performance in predicting the content score for a summary as well as for a summarization system. Expand
Improving Update Summarization by Revisiting the MMR Criterion
TLDR
A Maximal Marginal Relevance like criterion, modified and so called Smmr, is used to select sentences that are close to the topic and at the same time, distant from sentences used in already read documents. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 14 REFERENCES
Evaluation Challenges in Large-Scale Document Summarization
TLDR
A large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers is presented, showing the strengths and draw-backs of all evaluation methods and how they rank the different summarizers. Expand
Summarization Evaluation Methods: Experiments and Analysis
TLDR
The results show that different parameters of an experiment can affect how well a system scores, and describe how parameters can be controlled to produce a sound evaluation. Expand
Manual and automatic evaluation of summaries
TLDR
This paper shows the instability of the manual evaluation of summaries, and investigates the feasibility of automated summary evaluation based on the recent BLEU method from machine translation using accumulative n-gram overlap scores between system and human summaries. Expand
Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics
TLDR
The results show that automatic evaluation using unigram co-occurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results. Expand
Examining the consensus between human summaries: initial experiments with factoid analysis
We present a new approach to summary evaluation which combines two novel aspects, namely (a) content comparison between gold standard summary and system summary via factoids, a pseudo-semanticExpand
The formation of abstracts by the selection of sentences
TLDR
There was very little agreement between the subjects and machine methods in their selection of representative sentences, and human selection of sentences is considerably more variable than the machine methods. Expand
Bleu: a Method for Automatic Evaluation of Machine Translation
TLDR
This work proposes a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run. Expand
Computing Reliability for Coreference Annotation
TLDR
The solution I present accommodates a wide range of coding choices for the annotator, while preserving the same units across codings, and permits a straightforward application of reliability measurement in coreference annotation. Expand
Content analysis: an introduction to its methodology
History Conceptual Foundations Uses and Kinds of Inference The Logic of Content Analysis Designs Unitizing Sampling Recording Data Languages Constructs for Inference Analytical Techniques The Use ofExpand
Introduction to Statistical Analysis
Introduction to statistical analysis , Introduction to statistical analysis , مرکز فناوری اطلاعات و اطلاع رسانی کشاورزی
...
1
2
...