# The Risks of Invariant Risk Minimization

@article{Rosenfeld2020TheRO, title={The Risks of Invariant Risk Minimization}, author={Elan Rosenfeld and Pradeep Ravikumar and Andrej Risteski}, journal={ArXiv}, year={2020}, volume={abs/2010.05761} }

Invariant Causal Prediction (Peters et al., 2016) is a technique for out-of-distribution generalization which assumes that some aspects of the data distribution vary across the training set but that the underlying causal mechanisms remain constant. Recently, Arjovsky et al. (2019) proposed Invariant Risk Minimization (IRM), an objective based on this idea for learning deep, invariant features of data which are a complex function of latent variables; many alternatives have subsequently been…

## 173 Citations

### The Missing Invariance Principle Found - the Reciprocal Twin of Invariant Risk Minimization

- Computer ScienceNeurIPS
- 2022

It is proved that for general linear problems, MRI-v1 guarantees invariant predictors given sufficient number of environments, and it is empirically demonstrated that MRI- v1 strongly out-performs IRM-V1 and consistently achieves near-optimal OOD generalization in image-based nonlinear problems.

### Invariance Principle Meets Out-of-Distribution Generalization on Graphs

- Computer Science, MathematicsArXiv
- 2022

A new framework to capture the invariance of graphs for guaranteed OOD generalization under various distribution shifts is proposed and an information-theoretic objective is proposed to extract the desired subgraphs that maximally preserve the invariant intra-class information.

### Provable Domain Generalization via Invariant-Feature Subspace Recovery

- Computer ScienceICML
- 2022

This paper proposes to achieve domain generalization with Invariant-feature Subspace Recovery (ISR), and shows that both ISRs can be used as simple yet effective post-processing methods to improve the worst-case accuracy of (pre-)trained models against spurious correlations and group shifts.

### Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization

- Computer ScienceArXiv
- 2022

It is argued that devising simpler methods for learning predictors on existing features is a promising direction for future research, and Domain-Adjusted Regression (DARE) is introduced, a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.

### Domain Generalization via Nuclear Norm Regularization

- Computer ScienceArXiv
- 2023

This paper proposes a simple and effective regularization method based on the nuclear norm of the learned features for domain generalization that mitigates the impacts of environmental features and encourages learning domain-invariant features.

### Decorr: Environment Partitioning for Invariant Learning and OOD Generalization

- Computer ScienceArXiv
- 2022

This work proposes to split the dataset into several environments by finding low-correlated data subsets and shows that the Decorr method can achieve outstanding performance, while some other partitioning methods may lead to bad, even below-ERM results using the same training scheme of IRM.

### Bayesian Invariant Risk Minimization

- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022

Bayesian Invariant Risk Minimization (BIRM) is proposed by introducing Bayesian inference into the IRM to estimate the penalty of IRM based on the posterior distribution of classifiers (as opposed to a single classifier), which is much less prone to overfitting.

### Sparse Invariant Risk Minimization

- Computer ScienceICML
- 2022

This paper proposes a simple yet effective paradigm named Sparse Invariant Risk Minimization ( SparseIRM), which employs a global sparsity constraint as a defense to prevent spurious features from leaking in during the whole IRM process.

### Generalizing to Evolving Domains with Latent Structure-Aware Sequential Autoencoder

- Computer ScienceICML
- 2022

A probabilistic framework called Latent Structure-aware Sequential Autoencoder (LSSAE) is proposed to tackle the problem of evolving domain generalization via exploring the underlying continuous structure in the latent space of deep neural networks, where two major factors namely covariate shift and concept shift accounting for distribution shift in non-stationary environments are identified.

## 44 References

### Invariant Models for Causal Transfer Learning

- Computer ScienceJ. Mach. Learn. Res.
- 2018

This work relaxes the usual covariate shift assumption and assumes that it holds true for a subset of predictor variables: the conditional distribution of the target variable given this subset of predictors is invariant over all tasks.

### Risk Variance Penalization: From Distributional Robustness to Causality

- Computer ScienceArXiv
- 2020

A framework to unify the Empirical Risk Minimization, the Robust Optimization and the Risk Extrapolation is proposed, and a novel regularization method, Risk Variance Penalization (RVP), which is derived from REx is proposed.

### Out-of-Distribution Generalization via Risk Extrapolation (REx)

- Computer ScienceICML
- 2021

This work introduces the principle of Risk Extrapolation (REx), and shows conceptually how this principle enables extrapolation, and demonstrates the effectiveness and scalability of instantiations of REx on various OoD generalization tasks.

### Generalization and Invariances in the Presence of Unobserved Confounding

- Computer ScienceArXiv
- 2020

It is argued that generalization must be defined with respect to a broader class of distribution shifts, irrespective of their origin (arising from changes in observed, unobserved or target variables), and a new learning principle is proposed from which an explicit notion of generalization to certain new environments is expected, even in the presence of hidden confounding.

### A Causal Framework for Distribution Generalization

- Computer Science, MathematicsIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2022

The formal framework of distribution generalization is introduced that allows us to characterize under which class of interventions the causal function is minimax optimal, and a practical method is proposed, NILE, that achieves Distribution generalization in a nonlinear IV setting with linear extrapolation.

### Invariant Causal Prediction for Nonlinear Models

- Computer ScienceJournal of Causal Inference
- 2018

This work presents and evaluates an array of methods for nonlinear and nonparametric versions of ICP for learning the causal parents of given target variables and finds that an approach which first fits a nonlinear model with data pooled over all environments and then tests for differences between the residual distributions across environments is quite robust across a large variety of simulation settings.

### On Learning Invariant Representations for Domain Adaptation

- Computer ScienceICML
- 2019

This paper constructs a simple counterexample showing that, contrary to common belief, the above conditions are not sufficient to guarantee successful domain adaptation, and proposes a natural and interpretable generalization upper bound that explicitly takes into account the aforementioned shift.

### Learning Predictive Models That Transport

- Computer ScienceArXiv
- 2018

This work removes variables generated by unstable mechanisms from the joint factorization to yield the Graph Surgery Estimator—an interventional distribution that is invariant to the differences across domains.

### Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport

- Computer ScienceAISTATS
- 2019

It is proved that the surgery estimator finds stable relationships in strictly more scenarios than previous approaches which only consider conditional relationships, and performs competitively against entirely data-driven approaches.

### Adversarially Robust Generalization Requires More Data

- Computer ScienceNeurIPS
- 2018

It is shown that already in a simple natural data model, the sample complexity of robust learning can be significantly larger than that of "standard" learning.