Corpus ID: 231592355

Robustness Gym: Unifying the NLP Evaluation Landscape

@article{Goel2021RobustnessGU,
  title={Robustness Gym: Unifying the NLP Evaluation Landscape},
  author={Karan Goel and Nazneen Rajani and J. Vig and Samson Tan and J. Wu and Stephan Zheng and Caiming Xiong and M. Bansal and Christopher R'e},
  journal={ArXiv},
  year={2021},
  volume={abs/2101.04840}
}
Despite impressive performance on standard benchmarks, deep neural networks are often brittle when deployed in real-world systems. Consequently, recent research has focused on testing the robustness of such models, resulting in a diverse set of evaluation methodologies ranging from adversarial attacks to rule-based data transformations. In this work, we identify challenges with evaluating NLP systems and propose a solution in the form of Robustness Gym (RG),1 a simple and extensible evaluation… Expand
2 Citations
Evaluating Neural Model Robustness for Machine Comprehension
TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
  • Tao Gui, Xiao Wang, +31 authors Xuanjing Huang
  • Computer Science
  • ArXiv
  • 2021
  • PDF

References

SHOWING 1-10 OF 103 REFERENCES
Evaluating NLP Models via Contrast Sets
  • 50
  • PDF
Universal Adversarial Triggers for NLP
  • 21
Using Pre-Training Can Improve Model Robustness and Uncertainty
  • 148
  • PDF
The Effect of Natural Distribution Shift on Question Answering Models
  • 14
  • PDF
SummEval: Re-evaluating Summarization Evaluation
  • 9
  • PDF
...
1
2
3
4
5
...