Robustness Gym: Unifying the NLP Evaluation Landscape
@article{Goel2021RobustnessGU, title={Robustness Gym: Unifying the NLP Evaluation Landscape}, author={Karan Goel and Nazneen Rajani and J. Vig and Samson Tan and J. Wu and Stephan Zheng and Caiming Xiong and M. Bansal and Christopher R'e}, journal={ArXiv}, year={2021}, volume={abs/2101.04840} }
Despite impressive performance on standard benchmarks, deep neural networks are often brittle when deployed in real-world systems. Consequently, recent research has focused on testing the robustness of such models, resulting in a diverse set of evaluation methodologies ranging from adversarial attacks to rule-based data transformations. In this work, we identify challenges with evaluating NLP systems and propose a solution in the form of Robustness Gym (RG),1 a simple and extensible evaluation… Expand
2 Citations
TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing
- Computer Science
- ArXiv
- 2021
- PDF
References
SHOWING 1-10 OF 103 REFERENCES
Using Pre-Training Can Improve Model Robustness and Uncertainty
- Computer Science, Mathematics
- ICML
- 2019
- 148
- PDF
The Effect of Natural Distribution Shift on Question Answering Models
- Computer Science, Mathematics
- ICML
- 2020
- 14
- PDF
Snorkel: Rapid Training Data Creation with Weak Supervision
- Computer Science, Medicine
- Proc. VLDB Endow.
- 2017
- 328
- PDF
AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models
- Computer Science
- EMNLP/IJCNLP
- 2019
- 33
- PDF
Behavior Analysis of NLI Models: Uncovering the Influence of Three Factors on Robustness
- Computer Science
- NAACL-HLT
- 2018
- 19
- PDF