Measure and Improve Robustness in NLP Models: A Survey

  title={Measure and Improve Robustness in NLP Models: A Survey},
  author={Xuezhi Wang and Haohan Wang and Diyi Yang},
As NLP models achieved state-of-the-art performances over benchmarks and gained wide applications, it has been increasingly important to ensure the safe deployment of these models in the real world, e.g., making sure the models are robust against unseen or challenging scenarios. Despite robustness being an increasingly studied topic, it has been separately explored in applications like vision and NLP, with various definitions, evaluation and mitigation strategies in multiple lines of research… 

Tables from this paper

Multi-modal Robustness Analysis Against Language and Visual Perturbations
This work performs the first extensive robustness study of joint visual and language modeling approaches against various real-world perturbations focusing on video and language, and proposes two large-scale benchmark datasets for text-to-video retrieval.
Why Robust Natural Language Understanding is a Challenge
It is observed that, although the data is almost linearly separable, the verifier struggles to output positive results and the problems and implications are explained.
MRCLens: an MRC Dataset Bias Detection Toolkit
This work introduces MRCLens, a toolkit which detects whether biases exist before users train the full model, and provides a categorization of common biases in MRC.


Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models
This paper aims to automatically identify spurious correlations in NLP models at scale by leveraging existing interpretability methods to extract tokens that significantly affect model’s decision process from the input text and identifying “genuine” and “spurious” tokens.
A Survey of Data Augmentation Approaches for NLP
This paper introduces and motivate data augmentation for NLP, and then discusses major methodologically representative approaches, and highlights techniques that are used for popular NLP applications and tasks.
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing
In Search of Lost Domain Generalization
This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.
Don’t Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
This paper trains a naive model that makes predictions exclusively based on dataset biases, and a robust model as part of an ensemble with the naive one in order to encourage it to focus on other patterns in the data that are more likely to generalize.
End-to-End Bias Mitigation by Modelling Biases in Corpora
This work proposes two learning strategies to train neural models, which are more robust to such biases and transfer better to out-of-domain datasets and better transfer to other textual entailment datasets.
Towards Debiasing NLU Models from Unknown Biases
This work presents a self-debiasing framework that prevents models from mainly utilizing biases without knowing them in advance, and shows that it allows these existing methods to retain the improvement on the challenge datasets without specifically targeting certain biases.
Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge
This paper reduces the problem of automatically generating adversarial examples that violate a set of given First-Order Logic constraints in Natural Language Inference by maximising a quantity measuring the degree of violation of such constraints and using a language model for generating linguistically-plausible examples.
CAT-Gen: Improving Robustness in NLP Models via Controlled Adversarial Text Generation
This work presents a Controlled Adversarial Text Generation (CAT-Gen) model that, given an input text, generates adversarial texts through controllable attributes that are known to be invariant to task labels.
More Bang for Your Buck: Natural Perturbation for Robust Question Answering
It is found that when natural perturbations are moderately cheaper to create, it is more effective to train models using them: such models exhibit higher robustness and better generalization, while retaining performance on the original BoolQ dataset.