Annotation Artifacts in Natural Language Inference Data

@article{Gururangan2018AnnotationAI,
  title={Annotation Artifacts in Natural Language Inference Data},
  author={Suchin Gururangan and Swabha Swayamdipta and Omer Levy and Roy Schwartz and Samuel R. Bowman and Noah A. Smith},
  journal={ArXiv},
  year={2018},
  volume={abs/1803.02324}
}
Large-scale datasets for natural language inference are created by presenting crowd workers with a sentence (premise), and asking them to generate three new sentences (hypotheses) that it entails, contradicts, or is logically neutral with respect to. We show that, in a significant portion of such data, this protocol leaves clues that make it possible to identify the label by looking only at the hypothesis, without observing the premise. Specifically, we show that a simple text categorization… CONTINUE READING

Figures, Tables, and Topics from this paper.

Similar Papers

Citations

Publications citing this paper.
SHOWING 1-10 OF 89 CITATIONS

DRr-Net: Dynamic Re-Read Network for Sentence Semantic Matching

  • AAAI
  • 2019
VIEW 6 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Misleading Failures of Partial-input Baselines

VIEW 14 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

Image-Enhanced Multi-level Sentence Representation Net for Natural Language Inference

  • 2018 IEEE International Conference on Data Mining (ICDM)
  • 2018
VIEW 6 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Visual Entailment Task for Visually-Grounded Language Learning

VIEW 8 EXCERPTS
CITES BACKGROUND & METHODS
HIGHLY INFLUENCED

FILTER CITATIONS BY YEAR

2016
2019

CITATION STATISTICS

  • 18 Highly Influenced Citations

  • Averaged 29 Citations per year from 2017 through 2019

References

Publications referenced by this paper.