DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

@inproceedings{Dua2019DROPAR,
  title={DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs},
  author={Dheeru Dua and Yizhong Wang and Pradeep Dasigi and Gabriel Stanovsky and Sameer Singh and Matt Gardner},
  booktitle={NAACL-HLT},
  year={2019}
}
Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this crowdsourced, adversarially-created, 96k-question benchmark, a system must resolve references in a… CONTINUE READING
23
Twitter Mentions

Figures, Tables, Results, and Topics from this paper.

Key Quantitative Results

  • We apply state-of-the-art methods from both the reading comprehension and semantic parsing literature on this dataset and show that the best systems only achieve 32.7% F1 on our generalized accuracy metric, while expert human performance is 96.0%.
  • The reading comprehension methods perform the best, with our best baseline achieving 32.7% F1 on our generalized accuracy metric, while expert human performance is 96.4%.
  • This model reaches 47% F1, a 14.3% absolute increase over the best baseline system.

Citations

Publications citing this paper.

References

Publications referenced by this paper.
SHOWING 1-10 OF 59 REFERENCES