Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models
@inproceedings{Wang2022IdentifyingAM, title={Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models}, author={Tianlu Wang and Diyi Yang and Xuezhi Wang}, booktitle={NAACL-HLT}, year={2022} }
Recently, NLP models have achieved remark-able progress across a variety of tasks; however, they have also been criticized for being not robust. Many robustness problems can be attributed to models exploiting spurious correlations , or shortcuts between the training data and the task labels. Most existing work identifies a limited set of task-specific shortcuts via human priors or error analyses, which requires extensive expertise and efforts. In this paper, we aim to automatically identify such…
4 Citations
Measure and Improve Robustness in NLP Models: A Survey
- Computer ScienceNAACL
- 2022
A unifying survey of how to define, measure and improve robustness in NLP is provided, which first connects multiple definitions of robustness, then unify various lines of work on identifying robustness failures and evaluating models’ robustness.
A Rationale-Centric Framework for Human-in-the-loop Machine Learning
- Computer ScienceACL
- 2022
A novel rational-centric framework with human-in-the-loop – Rationales-centric Double-robustness Learning (RDL) – to boost model out-of-distribution performance in few-shot learning scenarios, which enables fast and accurate generalisation.
A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes
- Computer ScienceArXiv
- 2022
The sensitivity of a broad set of models to noise corruptions in foregrounds, backgrounds and attributes is evaluated and saliency methods are used to discover spurious features that drive the background sensitivity of models and assess alignment of saliency maps with foregrounds.
Does Your Model Classify Entities Reasonably? Diagnosing and Mitigating Spurious Correlations in Entity Typing
- Computer ScienceArXiv
- 2022
Experimental results on the UFET dataset show that the counterfactual data augmentation approach helps improve generalization of different entity typing models with consistently better performance on both in- and out-of-distribution test sets.
References
SHOWING 1-10 OF 51 REFERENCES
An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2020
This work proposes to use multi-task learning (MTL) to improve generalization in the case of extreme minority models, and shows that MTL with the right auxiliary tasks significantly improves performance on challenging examples without hurting the in-distribution performance.
Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models
- Computer ScienceNAACL
- 2021
This work shows that the words in the NLU training set can be modeled as a long-tailed distribution, and proposes a shortcut mitigation framework LGTR, to suppress the model from making overconfident predictions for samples with large shortcut degree.
HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalizability
- Computer ScienceACL
- 2021
A simple yet effective data augmentation technique to better regularize the model and encourage it to learn more generalizable features, HiddenCut, which outperforms the state-of-the-art augmentation methods on the GLUE benchmark, and consistently exhibits superior generalization performances on out- of-distribution and challenging counterexamples.
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles
- Computer ScienceFINDINGS
- 2020
This paper proposes a method that can automatically detect and ignore dataset-specific patterns, which it hypothesize are likely to reflect dataset bias, and trains a lower capacity model in an ensemble with a higher capacity model.
Identifying spurious correlations for robust text classification
- Computer ScienceFINDINGS
- 2020
This paper treats this as a supervised classification problem, using features derived from treatment effect estimators to distinguish spurious correlations from “genuine” ones, and finds that the approach works well even with limited training examples, and that it is possible to transport the word classifier to new domains.
Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals
- Computer ScienceAAAI
- 2021
This paper proposes to train a robust text classifier by augmenting the training data with automatically generated counterfactual data and shows that the robust classifier makes meaningful and trustworthy predictions by emphasizing causal features and de-emphasizing non-causal features.
Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately
- Computer ScienceFAccT
- 2021
This work completely characterize how the removal of spurious features affects accuracy across different groups (more generally, test distributions) and shows that robust self-training produces models that no longer depend on spurious features without affecting their overall accuracy.
Robustness to Spurious Correlations via Human Annotations
- Computer ScienceICML
- 2020
A framework for making models robust to spurious correlations by leveraging humans' common sense knowledge of causality is presented and a new distributionally robust optimization objective over unmeasured variables (UV-DRO) is introduced to control the worst-case loss over possible test-time shifts.
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
- Computer ScienceACL
- 2019
There is substantial room for improvement in NLI systems, and the HANS dataset can motivate and measure progress in this area, which contains many examples where the heuristics fail.
Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection
- Computer ScienceACL
- 2020
This work builds hierarchical explanations by detecting feature interactions and visualize how words and phrases are combined at different levels of the hierarchy, which can help users understand the decision-making of black-box models.