DUC 2005: Evaluation of Question-Focused Summarization Systems

  title={DUC 2005: Evaluation of Question-Focused Summarization Systems},
  author={Hoa Trang Dang},
  • H. Dang
  • Published 23 July 2006
  • Computer Science
The Document Understanding Conference (DUC) 2005 evaluation had a single user-oriented, question-focused summarization task, which was to synthesize from a set of 25--50 documents a well-organized, fluent answer to a complex question. The evaluation shows that the best summarization systems have difficulty extracting relevant sentences in response to complex questions (as opposed to representative sentences that might be appropriate to a generic summary). The relatively generous allowance of… 

Figures and Tables from this paper

Scaling Up Query-Focused Summarization to Meet Open-Domain Question Answering

This work combines passage retrieval with text generation to produce the summary of the retrieved passages given the input query and shows that a few samples are sufficient to fine-tune a large generative model with retrieved passages.

Guiding Extractive Summarization with Question-Answering Rewards

This paper argues that quality summaries should serve as document surrogates to answer important questions, and such question-answer pairs can be conveniently obtained from human abstracts and learns to promote summaries that are informative, fluent, and perform competitively on question-answering.

An Extractive Text Summarizer Based on Significant Words

A new quantification measure for word significance used in natural language processing (NLP) tasks is proposed and successfully applied to an extractive text summarization approach, achieving a state-of-the-art performance.

Query Focused Multi-Document Summarization with Distant Supervision

This work proposes a coarse-to-fine modeling framework which introduces separate modules for estimating whether segments are relevant to the query, likely to contain an answer, and central and demonstrates that this framework outperforms strong comparison systems on standard QFS benchmarks.

Text Summarization with Latent Queries

LAQSUM is the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms under a deep generative framework, allowing users to plug-and-play queries of any type at test time.

Coarse-to-Fine Query Focused Multi-Document Summarization

This work proposes a coarse-to-fine modeling framework which employs progressively more accurate modules for estimating whether text segments are relevant, likely to contain an answer, and central and presents an instantiation of this framework with a trained evidence estimator.

EntSUM: A Data Set for Entity-Centric Summarization

This work introduces a human-annotated data set (E NT SUM) for controllable summarization with a focus on named entities as the aspects to control and proposes extensions to state-of-the-art summarization approaches that achieve substantially better results on this data set.

Unsupervised Multi-Document Summarization

It is suggested that unsupervised systems can compete with supervised systems in generating summaries and the abstractiveness of a system negatively correlates with the information contained in a summary, given that the information content is the most important aspect of a summary.

Document Summarization with Latent Queries

This framework formulates summarization as a generative process, and jointly optimizes a latent query model and a conditional language model, and outperforms strong comparison systems across benchmarks, query types, document settings, and target domains.



The Effects of Human Variation in DUC Summarization Evaluation

How the variation in human judgments does and does not affect the results and their interpretation of automatic text summarization systems’ output is examined.

ROUGE: A Package for Automatic Evaluation of Summaries

Four different RouGE measures are introduced: ROUGE-N, ROUge-L, R OUGE-W, and ROUAGE-S included in the Rouge summarization evaluation package and their evaluations.

Evaluating DUC 2005 using Basic Elements

It is shown that this method correlates better with human judgments than any other automated procedure to date, and overcomes the subjectivity/variability problems of manual methods that require humans to preprocess summaries to be evaluated.

An Empirical Study of Information Synthesis Task

This paper describes an empirical study of the "Information Synthesis" task, defined as the process of extracting, organizing and inter-relating the pieces of information contained in a set of relevant documents, in order to obtain a comprehensive, non redundant report that satisfies the information need.

Applying the Pyramid Method in DUC 2005

It is found that a modified pyramid score gave good results and would simplify peer annotation in the future and high score correlations between sets from different annotators, and good interannotator agreement, indicate that participants can perform annotation reliably.

The effect of topic set size on retrieval experiment error

Using TREC results to empirically derive error rates based on the number of topics used in a test and the observed difference in the average scores indicates researchers need to take care when concluding one method is better than another, especially if few topics are used.