Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models

  title={Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models},
  author={Tianlu Wang and Diyi Yang and Xuezhi Wang},
Recently, NLP models have achieved remark-able progress across a variety of tasks; however, they have also been criticized for being not robust. Many robustness problems can be attributed to models exploiting spurious correlations , or shortcuts between the training data and the task labels. Most existing work identifies a limited set of task-specific shortcuts via human priors or error analyses, which requires extensive expertise and efforts. In this paper, we aim to automatically identify such… 

Figures and Tables from this paper

Measure and Improve Robustness in NLP Models: A Survey

A unifying survey of how to define, measure and improve robustness in NLP is provided, which first connects multiple definitions of robustness, then unify various lines of work on identifying robustness failures and evaluating models’ robustness.

A Rationale-Centric Framework for Human-in-the-loop Machine Learning

A novel rational-centric framework with human-in-the-loop – Rationales-centric Double-robustness Learning (RDL) – to boost model out-of-distribution performance in few-shot learning scenarios, which enables fast and accurate generalisation.

A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes

The sensitivity of a broad set of models to noise corruptions in foregrounds, backgrounds and attributes is evaluated and saliency methods are used to discover spurious features that drive the background sensitivity of models and assess alignment of saliency maps with foregrounds.

Less is Better: Recovering Intended-Feature Subspace to Robustify NLU Models

A novel model, RISK, that can consistently improve model generalization to out-of-distribution set, and achieves a new state- of-the-art performance on NLU tasks is developed.

A Survey on Measuring and Mitigating Reasoning Shortcuts in Machine Reading Comprehension

This survey paper focuses on the task of machine reading comprehension (MRC), an important task for showcasing high-level language understanding that also suffers from a range of shortcuts, and summarizes the available techniques for measuring and mitigating shortcuts.

Shortcut Learning of Large Language Models in Natural Language Understanding: A Survey

Methods to identify shortcut learning behavior in LLMs, characterize the reasons for shortcut learning, as well as introduce mitigation solutions are introduced and key challenges are identified.

Does Your Model Classify Entities Reasonably? Diagnosing and Mitigating Spurious Correlations in Entity Typing

Experimental results on the UFET dataset show that the counterfactual data augmentation approach helps improve generalization of different entity typing models with consistently better performance on both in- and out-of-distribution test sets.



An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models

This work proposes to use multi-task learning (MTL) to improve generalization in the case of extreme minority models, and shows that MTL with the right auxiliary tasks significantly improves performance on challenging examples without hurting the in-distribution performance.

Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models

This work shows that the words in the NLU training set can be modeled as a long-tailed distribution, and proposes a shortcut mitigation framework LGTR, to suppress the model from making overconfident predictions for samples with large shortcut degree.

HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalizability

A simple yet effective data augmentation technique to better regularize the model and encourage it to learn more generalizable features, HiddenCut, which outperforms the state-of-the-art augmentation methods on the GLUE benchmark, and consistently exhibits superior generalization performances on out- of-distribution and challenging counterexamples.

Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles

This paper proposes a method that can automatically detect and ignore dataset-specific patterns, which it hypothesize are likely to reflect dataset bias, and trains a lower capacity model in an ensemble with a higher capacity model.

Identifying spurious correlations for robust text classification

This paper treats this as a supervised classification problem, using features derived from treatment effect estimators to distinguish spurious correlations from “genuine” ones, and finds that the approach works well even with limited training examples, and that it is possible to transport the word classifier to new domains.

Robustness to Spurious Correlations in Text Classification via Automatically Generated Counterfactuals

This paper proposes to train a robust text classifier by augmenting the training data with automatically generated counterfactual data and shows that the robust classifier makes meaningful and trustworthy predictions by emphasizing causal features and de-emphasizing non-causal features.

Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately

This work completely characterize how the removal of spurious features affects accuracy across different groups (more generally, test distributions) and shows that robust self-training produces models that no longer depend on spurious features without affecting their overall accuracy.

Robustness to Spurious Correlations via Human Annotations

A framework for making models robust to spurious correlations by leveraging humans' common sense knowledge of causality is presented and a new distributionally robust optimization objective over unmeasured variables (UV-DRO) is introduced to control the worst-case loss over possible test-time shifts.

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

There is substantial room for improvement in NLI systems, and the HANS dataset can motivate and measure progress in this area, which contains many examples where the heuristics fail.

Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection

This work builds hierarchical explanations by detecting feature interactions and visualize how words and phrases are combined at different levels of the hierarchy, which can help users understand the decision-making of black-box models.