MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics

@inproceedings{Chen2020MOCHAAD,
  title={MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics},
  author={Anthony Chen and Gabriel Stanovsky and Sameer Singh and Matt Gardner},
  booktitle={EMNLP},
  year={2020}
}
Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers. However, progress is impeded by existing generation metrics, which rely on token overlap and are agnostic to the nuances of reading comprehension. To address this, we introduce a benchmark for training and evaluating generative reading comprehension metrics: MOdeling Correctness with Human Annotations. MOCHA contains 40K human… Expand
Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval
GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
CoreQuisite: Circumstantial Preconditions of Common Sense Knowledge

References

SHOWING 1-10 OF 51 REFERENCES
Adversarial Examples for Evaluating Reading Comprehension Systems
The NarrativeQA Reading Comprehension Challenge
SQuAD: 100, 000+ Questions for Machine Comprehension of Text
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning
RACE: Large-scale ReAding Comprehension Dataset From Examinations
Evaluating Question Answering Evaluation
...
1
2
3
4
5
...