Corpus ID: 215416298

Evaluating Machines by their Real-World Language Use

@article{Zellers2020EvaluatingMB,
  title={Evaluating Machines by their Real-World Language Use},
  author={Rowan Zellers and Ari Holtzman and Elizabeth Clark and Lianhui Qin and A. Farhadi and Yejin Choi},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.03607}
}
There is a fundamental gap between how humans understand and use language -- in open-ended, real-world situations -- and today's NLP benchmarks for language understanding. To narrow this gap, we propose to evaluate machines by their success at real-world language use -- which greatly expands the scope of language tasks that can be measured and studied. We introduce TuringAdvice, a new challenge for language understanding systems. Given a complex situation faced by a real person, a machine must… Expand
Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover
GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation
TellMeWhy: A Dataset for Answering Why-Questions in Narratives
Evaluation of Text Generation: A Survey
Forecasting AI Progress: A Research Agenda
Measuring Massive Multitask Language Understanding
What Will it Take to Fix Benchmarking in Natural Language Understanding?
Experience Grounds Language

References

SHOWING 1-10 OF 66 REFERENCES
Learning and Evaluating General Linguistic Intelligence
Unifying Human and Statistical Evaluation for Natural Language Generation
Improving Language Understanding by Generative Pre-Training
On Evaluating and Comparing Open Domain Dialog Systems
On Making Reading Comprehension More Comprehensive
...
1
2
3
4
5
...