The challenging task of summary evaluation: an overview

@article{Lloret2018TheCT,
  title={The challenging task of summary evaluation: an overview},
  author={Elena Lloret and Laura Plaza and Ahmet Aker},
  journal={Language Resources and Evaluation},
  year={2018},
  volume={52},
  pages={101-148}
}
AbstractEvaluation is crucial in the research and development of automatic summarization applications, in order to determine the appropriateness of a summary based on different criteria, such as the content it contains, and the way it is presented. To perform an adequate evaluation is of great relevance to ensure that automatic summaries can be useful for the context and/or application they are generated for. To this end, researchers must be aware of the evaluation metrics, approaches, and… Expand
Meeting Summarization, A Challenge for Deep Learning
TLDR
A short survey of deep learning approaches to abstractive text summarization and then highlights the various challenges that will have to be solved in the coming years to deal with meeting summaries in order to be able to provide aText summarization tool that generates good quality summaries. Expand
Best Practices for Crowd-based Evaluation of German Summarization: Comparing Crowd, Expert and Automatic Evaluation
TLDR
This work proposes crowdsourcing as a fast, scalable, and cost-effective alternative to expert evaluations to assess the intrinsic and extrinsic quality of summarization by comparing crowd ratings with expert ratings and automatic metrics such as ROUGE, BLEU, or BertScore on a German summarization data set. Expand
A Crowdsourcing Approach to Evaluate the Quality of Query-based Extractive Text Summaries
TLDR
This work analyzes the feasibility and appropriateness of micro-task crowdsourcing for evaluation of different summary quality characteristics and reports an ongoing work on the crowdsourced evaluation of query-based extractive text summaries. Expand
Evaluation of text summaries without human references based on the linear optimization of content metrics using a genetic algorithm
TLDR
A linear optimization of content-based metrics is proposed using a Genetic Algorithm (GA) to improve the correlation between automatic and manual evaluation. Expand
Automatic text summarization: A comprehensive survey
TLDR
This research provides a comprehensive survey for the researchers by presenting the different aspects of ATS: approaches, methods, building blocks, techniques, datasets, evaluation methods, and future research directions. Expand
Optimizing Data-Driven Models for Summarization as Parallel Tasks
TLDR
Tackling of a hard optimization problem of computational linguistics, specifically automatic multi-document text summarization, using grid computing is presented, which results in improving a Document Understanding Conference (DUC) benchmark recall metric over a previous setting. Expand
A Summary Evaluation Method Combining Linguistic Quality and Semantic Similarity
  • Xingwen Wang, Bo Liu, Libin Shen, Yong Li, Rentao Gu, Guangzhi Qu
  • 2020 International Conference on Computational Science and Computational Intelligence (CSCI)
  • 2020
Summary evaluation method is crucial to promote the development of text summarization technologies. However, most of the existing summary evaluation methods seldom consider the content integrity andExpand
Studying Summarization Evaluation Metrics in the Appropriate Scoring Range
TLDR
It is shown that, surprisingly, evaluation metrics which behave similarly on these datasets (average- scoring range) strongly disagree in the higher-scoring range in which current systems now operate. Expand
Reliability of Human Evaluation for Text Summarization: Lessons Learned and Challenges Ahead
Only a small portion of research papers with human evaluation for text summarization provide information about the participant demographics, task design, and experiment protocol. Additionally, manyExpand
Evaluation of Improved Components of AMIS Project for Speech Recognition, Machine Translation and Video/Audio/Text Summarization
TLDR
The researchers can state with certainty that the new development of scene 1, which has received many negative evaluations among professionals, should be discontinued and the researchers cannot unambiguously indicate one single scenario, recommended as the only one for further development. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 143 REFERENCES
A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization
TLDR
A corpus of summaries produced by several state-of-the-art extractive summarization systems or by popular baseline systems is presented to facilitate future research on generic summarization and motivates the need for development of more sensitive evaluation measures and for approaches to system combination in summarization. Expand
Text summarisation in progress: a literature review
TLDR
This paper contains a large literature review in the research field of Text Summarisation (TS) based on Human Language Technologies, where the existing methodologies and systems are explained, as well as new research that has emerged concerning the automatic evaluation of summaries’ quality. Expand
An Assessment of the Accuracy of Automatic Evaluation in Summarization
TLDR
An assessment of the automatic evaluations used for multi-document summarization of news, and recommendations about how any evaluation, manual or automatic, should be used to find statistically significant differences between summarization systems. Expand
Automatic summarising: The state of the art
  • K. Jones
  • Computer Science
  • Inf. Process. Manag.
  • 2007
TLDR
The conclusions drawn are that automatic summarisation has made valuable progress, with useful applications, better evaluation, and more task understanding, but summarising systems are still poorly motivated in relation to the factors affecting them, and evaluation needs taking much further to engage with the purposes summaries are intended to serve. Expand
Automatic Summarization
TLDR
The challenges that remain open, in particular the need for language generation and deeper semantic understanding of language that would be necessary for future advances in the field are discussed. Expand
ROUGE-C: A fully automated evaluation method for multi-document summarization
TLDR
ROUGE-C applies the ROUGE method alternatively by replacing the reference summaries with source document as well as query-focused information (if any), and therefore it enables a fully manual-independent way of evaluating multi-document summarization. Expand
Recent automatic text summarization techniques: a survey
TLDR
A comprehensive survey of recent text summarization extractive approaches developed in the last decade is presented and the discussion of useful future directions that can help researchers to identify areas where further research is needed are discussed. Expand
Summarization Evaluation Methods: Experiments and Analysis
TLDR
The results show that different parameters of an experiment can affect how well a system scores, and describe how parameters can be controlled to produce a sound evaluation. Expand
Text summarization contribution to semantic question answering: New approaches for finding answers on the web
TLDR
The main goal of this paper is to determine to what extent TS can help semantic QA approaches, when using summaries instead of search engine snippets as the corpus for answering questions. Expand
Automatic Summary Evaluation without Human Models
TLDR
These results on a large scale evaluation from the Text Analysis Conference show that input-summary comparisons can be very effective and can be used to rank participating systems very similarly to manual model-based evaluations as well as to manual human judgments of summary quality without reference to a model. Expand
...
1
2
3
4
5
...