Corpus ID: 222124366

Evaluating Models' Local Decision Boundaries via Contrast Sets.

  title={Evaluating Models' Local Decision Boundaries via Contrast Sets.},
  author={M. Gardner and Yoav Artzi and Jonathan Berant and Ben Bogin and Sihao Chen and Dheeru Dua and Yanai Elazar and Ananth Gottumukkala and Nitish Gupta and Hanna Hajishirzi and Gabriel Ilharco and Daniel Khashabi and Kevin Lin and Jiangming Liu and Nelson F. Liu and Phoebe Mulcaire and Qiang Ning and S. Singh and N. A. Smith and Sanjay Subramanian and Eric Wallace and A. Zhang and Ben Zhou},
  journal={arXiv: Computation and Language},
  • M. Gardner, Yoav Artzi, +20 authors Ben Zhou
  • Published 2020
  • Computer Science
  • arXiv: Computation and Language
  • Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture a dataset's intended capabilities. We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data. In particular, after a dataset is constructed, we recommend that the… CONTINUE READING

    Figures and Tables from this paper.


    Publications referenced by this paper.
    DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
    • 123
    • PDF
    "Going on a vacation" takes longer than "Going for a walk": A Study of Temporal Commonsense Understanding
    • 18
    • PDF
    A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning
    • 16
    • Highly Influential
    • PDF
    Deep contextualized word representations
    • 4,146
    • Highly Influential
    • PDF
    Learning the Difference that Makes a Difference with Counterfactually-Augmented Data
    • 62
    • Highly Influential
    • PDF
    Reasoning Over Paragraph Effects in Situations
    • 21
    • Highly Influential
    • PDF
    Seeing Things from a Different Angle: Discovering Diverse Perspectives about Claims
    • 20
    • PDF
    A Corpus for Reasoning About Natural Language Grounded in Photographs
    • 66
    • PDF