Best practices for the human evaluation of automatically generated text

@inproceedings{Lee2019BestPF,
  title={Best practices for the human evaluation of automatically generated text},
  author={C. Lee and Albert Gatt and Emiel van Miltenburg and S. Wubben and E. Krahmer},
  booktitle={INLG},
  year={2019}
}
Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated. While there is some agreement regarding automatic metrics, there is a high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how human evaluation is currently conducted, and presents a set of best practices, grounded in the literature. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG. 
52 Citations
A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems
  • 2
  • PDF
Evaluation of Text Generation: A Survey
  • 23
  • Highly Influenced
  • PDF
Automating Text Naturalness Evaluation of NLG Systems
  • PDF
Human or Machine: Automating Human Likeliness Evaluation of NLG Texts
  • 1
  • PDF
Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definitions
  • 9
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 105 REFERENCES
An Investigation into the Validity of Some Metrics for Automatically Evaluating Natural Language Generation Systems
  • 145
  • Highly Influential
  • PDF
Comparing Automatic and Human Evaluation of NLG Systems
  • 168
  • PDF
Why We Need New Evaluation Metrics for NLG
  • 192
  • Highly Influential
  • PDF
Evaluation in the context of natural language generation
  • 75
  • PDF
A Structured Review of the Validity of BLEU
  • Ehud Reiter
  • Computer Science
  • Computational Linguistics
  • 2018
  • 84
  • PDF
Evaluation of Machine Translation and its Evaluation
  • 310
  • PDF
Rethinking the Agreement in Human Evaluation Tasks
  • 12
  • Highly Influential
  • PDF
RankME: Reliable Human Ratings for Natural Language Generation
  • 38
  • Highly Influential
  • PDF
...
1
2
3
4
5
...