DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
- Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner
- Computer ScienceNorth American Chapter of the Association for…
- 1 March 2019
A new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs, and presents a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.
Generating Natural Adversarial Examples
- Zhengli Zhao, Dheeru Dua, Sameer Singh
- Computer ScienceInternational Conference on Learning…
- 31 October 2017
This paper proposes a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks.
Evaluating Models’ Local Decision Boundaries via Contrast Sets
- Matt Gardner, Yoav Artzi, Ben Zhou
- Computer ScienceFindings
- 6 April 2020
A more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data, and recommends that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets.
Evaluating NLP Models via Contrast Sets
- Matt Gardner, Yoav Artzi, Ben Zhou
- Computer ScienceArXiv
- 6 April 2020
A new annotation paradigm for NLP is proposed that helps to close systematic gaps in the test data, and it is recommended that after a dataset is constructed, the dataset authors manually perturb the test instances in small but meaningful ways that change the gold label, creating contrast sets.
ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension
- Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, Matt Gardner
- Computer ScienceArXiv
- 29 December 2019
An evaluation server, ORB, is presented, that reports performance on seven diverse reading comprehension datasets, encouraging and facilitating testing a single model's capability in understanding a wide variety of reading phenomena.
Dynamic Sampling Strategies for Multi-Task Reading Comprehension
- Ananth Gottumukkala, Dheeru Dua, Sameer Singh, Matt Gardner
- Computer ScienceAnnual Meeting of the Association for…
- 1 July 2020
This work shows that a simple dynamic sampling strategy, selecting instances for training proportional to the multi-task model’s current performance on a dataset relative to its single task performance, gives substantive gains over prior multi- Task sampling strategies, mitigating the catastrophic forgetting that is common in multi- task learning.
Comprehensive Multi-Dataset Evaluation of Reading Comprehension
- Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Matt Gardner, Sameer Singh
- Computer ScienceConference on Empirical Methods in Natural…
- 1 November 2019
An evaluation server, ORB, is presented, that reports performance on seven diverse reading comprehension datasets, encouraging and facilitating testing a single model’s capability in understanding a wide variety of reading phenomena.
Benefits of Intermediate Annotations in Reading Comprehension
- Dheeru Dua, Sameer Singh, Matt Gardner
- Computer ScienceAnnual Meeting of the Association for…
- 1 July 2020
It is observed that for any collection budget, spending a fraction of it on intermediate annotations results in improved model performance, for two complex compositional datasets: DROP and Quoref.
PoMo: Generating Entity-Specific Post-Modifiers in Context
- Jun Seok Kang, IV RobertL.Logan, Niranjan Balasubramanian
- Computer ScienceNorth American Chapter of the Association for…
- 5 April 2019
PoMo, a post-modifier dataset created automatically from news articles reflecting a journalistic need for incorporating entity information that is relevant to a particular news event, is built.
Learning with Instance Bundles for Reading Comprehension
- Dheeru Dua, Pradeep Dasigi, Sameer Singh, Matt Gardner
- Computer ScienceConference on Empirical Methods in Natural…
- 18 April 2021
Drawing on ideas from contrastive estimation, several new supervision losses are introduced that compare question-answer scores across multiple related instances, and normalize these scores across various neighborhoods of closely contrasting questions and/or answers.
...
...