Corpus ID: 222377851

Reliable Evaluations for Natural Language Inference based on a Unified Cross-dataset Benchmark

@article{Zhang2020ReliableEF,
  title={Reliable Evaluations for Natural Language Inference based on a Unified Cross-dataset Benchmark},
  author={Guanhua Zhang and Bing Bai and Jian Liang and Kun Bai and Conghui Zhu and T. Zhao},
  journal={ArXiv},
  year={2020},
  volume={abs/2010.07676}
}
  • Guanhua Zhang, Bing Bai, +3 authors T. Zhao
  • Published 2020
  • Computer Science
  • ArXiv
  • Recent studies show that crowd-sourced Natural Language Inference (NLI) datasets may suffer from significant biases like annotation artifacts. Models utilizing these superficial clues gain mirage advantages on the in-domain testing set, which makes the evaluation results over-estimated. The lack of trustworthy evaluation settings and benchmarks stalls the progress of NLI research. In this paper, we propose to assess a model's trustworthy generalization performance with cross-datasets evaluation… CONTINUE READING

    References

    SHOWING 1-10 OF 45 REFERENCES
    Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference
    • 21
    • PDF
    Selection Bias Explorations and Debias Methods for Natural Language Sentence Matching Datasets
    • 13
    • PDF
    Annotation Artifacts in Natural Language Inference Data
    • 339
    • Highly Influential
    • PDF
    Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual
    • 32
    • PDF
    Sentence embeddings in NLI with iterative refinement encoders
    • 11
    • PDF
    Enhanced LSTM for Natural Language Inference
    • 512
    • PDF
    Multi-Task Deep Neural Networks for Natural Language Understanding
    • 349
    • PDF