# Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

@inproceedings{Zhang2021CanSS, title={Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?}, author={Dinghuai Zhang and Kartik Ahuja and Yilun Xu and Yisen Wang and Aaron C. Courville}, booktitle={ICML}, year={2021} }

Can models with particular structure avoid being biased towards spurious correlation in out-of-distribution (OOD) generalization? Peters et al. (2016) provides a positive answer for linear cases. In this paper, we use a functional modular probing method to analyze deep model structures under OOD setting. We demonstrate that even in biased models (which focus on spurious correlation) there still exist unbiased functional subnetworks. Furthermore, we articulate and demonstrate the functional…

## Figures and Tables from this paper

## 25 Citations

Handling Distribution Shifts on Graphs: An Invariance Perspective

- Computer Science
- 2022

A new invariant learning approach, Explore-to-Extrapolate Risk Minimization (EERM), that facilitates graph neural networks to leverage invariance principles for prediction and proves the validity of the method by theoretically showing its guarantee of a valid OOD solution.

Learning Modular Structures That Generalize Out-of-Distribution (Student Abstract)

- Computer ScienceAAAI
- 2022

This work combines two complementary neuron-level regularizers with a probabilistic differentiable binary mask over the network, to extract a modular sub-network that achieves better O.O.D. performance than the original network.

Model Agnostic Sample Reweighting for Out-of-Distribution Learning

- Computer ScienceICML
- 2022

This work proposes a principled method, M odel A gnostic sam PL e r E weighting ( MAPLE), to effectively address OOD problem, especially in overparameterized scenarios and empirically verify its superiority in surpassing state-of-the-art methods by a large margin.

Can You Win Everything with A Lottery Ticket?

- Computer Science

This first comprehensive assessment of lottery tickets from diverse aspects beyond test accuracy finds that an appropriate sparsity can yield the winning ticket to perform comparably or even better in all above four aspects, although some aspects appear more sensitive to the sparsification than others.

A Winning Hand: Compressing Deep Networks Can Improve Out-Of-Distribution Robustness

- Computer ScienceNeurIPS
- 2021

A large-scale analysis of popular model compression techniques which uncovers several intriguing patterns and shows the compatibility of CARDs with popular existing strategies, such as data augmentation and model size increase, and proposes a new robustness-improvement strategy that leverages the compactness of Cards via ensembling.

Sparse Fusion Mixture-of-Experts are Domain Generalizable Learners

- Computer ScienceArXiv
- 2022

This work reveals the mixture-of-experts (MoE) model’s generalizability on DG by leveraging to distributively handle multiple aspects of the predictive features across domains by proposing Sparse Fusion Mixture- of-Experts (SF-MoE), which incorporates sparsity and fusion mechanisms into the MoE framework to keep the model both sparse and predictive.

Bayesian Invariant Risk Minimization

- Computer Science
- 2022

Bayesian Invariant Risk Minimization (BIRM) is proposed by introducing Bayesian inference into the IRM to estimate the penalty of IRM based on the posterior distribution of classifiers (as opposed to a single classifier), which is much less prone to overfitting.

Generalizing to Unseen Domains: A Survey on Domain Generalization

- Computer ScienceIJCAI
- 2021

This paper provides a formal definition of domain generalization and discusses several related fields, and categorizes recent algorithms into three classes and present them in detail: data manipulation, representation learning, and learning strategy, each of which contains several popular algorithms.

Invariance Principle Meets Information Bottleneck for Out-of-Distribution Generalization

- Computer ScienceNeurIPS
- 2021

It is proved that a form of the information bottleneck constraint along with invariance helps address key failures when invariant features capture all the information about the label and also retains the existing success when they do not.

Not All Parameters Should Be Treated Equally: Deep Safe Semi-supervised Learning under Class Distribution Mismatch

- Computer ScienceAAAI
- 2022

Safe Parameter Learning (SPL) is proposed to discover safe parameters and make the harmful parameters inactive, such that it can mitigate the adverse effects caused by unseen-class data.

## References

SHOWING 1-10 OF 94 REFERENCES

Out-of-Distribution Generalization with Maximal Invariant Predictor

- Computer ScienceArXiv
- 2020

The basic results of probability are used to prove maximal Invariant Predictor condition, a theoretical result that can be used to identify the OOD optimal solution and the superiority of IGA over previous methods on both the original and the extended version of Colored-MNIST.

Learning Robust Models Using The Principle of Independent Causal Mechanisms

- Computer ScienceGCPR
- 2021

It is shown theoretically and experimentally that neural networks trained in this framework focus on relations remaining invariant across environments and ignore unstable ones, and it is proved that the recovered stable relations correspond to the true causal mechanisms under certain conditions.

Counterfactual Generative Networks

- Computer ScienceICLR
- 2021

This work proposes to decompose the image generation process into independent causal mechanisms that train without direct supervision and allows for generating counterfactual images, and demonstrates the ability of the model to generate such images on MNIST and ImageNet.

Systematic generalisation with group invariant predictions

- Computer ScienceICLR
- 2021

This work considers situations where the presence of dominant simpler correlations with the target variable in a training set can cause an SGD-trained neural network to be less reliant on more persistently correlating complex features, and suggests a simple invariance penalty that can perform better than alternatives.

The Risks of Invariant Risk Minimization

- Computer ScienceICLR
- 2021

In this setting, the first analysis of classification under the IRM objective is presented, and it is found that IRM and its alternatives fundamentally do not improve over standard Empirical Risk Minimization.

Invariant Models for Causal Transfer Learning

- Computer ScienceJ. Mach. Learn. Res.
- 2018

This work relaxes the usual covariate shift assumption and assumes that it holds true for a subset of predictor variables: the conditional distribution of the target variable given this subset of predictors is invariant over all tasks.

Out-of-Distribution Generalization via Risk Extrapolation

- Computer Science
- 2020

It is proved that variants of REx can recover the causal mechanisms of the targets, while also providing some robustness to changes in the input distribution, and REx is able to outperform alternative methods such as Invariant Risk Minimization in situations where these types of shift co-occur.

Understanding the Failure Modes of Out-of-Distribution Generalization

- Computer ScienceICLR
- 2021

This work identifies the fundamental factors that give rise to why models fail this way in easy-to-learn tasks where one would expect these models to succeed, and uncovers two complementary failure modes.

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

- Computer ScienceArXiv
- 2019

The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

- Computer ScienceICLR
- 2020

This work proposes to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional changes, e.g. due to interventions, actions of agents and other sources of non-stationarities and shows that causal structures can be parameterized via continuous variables and learned end-to-end.