Learning Modular Structures That Generalize Out-of-Distribution (Student Abstract)

  title={Learning Modular Structures That Generalize Out-of-Distribution (Student Abstract)},
  author={Arjun Ashok and Chaitanya Devaguptapu and Vineeth N. Balasubramanian},
Out-of-distribution (O.O.D.) generalization remains to be a key challenge for real-world machine learning systems. We describe a method for O.O.D. generalization that, through training, encourages models to only preserve features in the network that are well reused across multiple training domains. Our method combines two complementary neuron-level regularizers with a probabilistic differentiable binary mask over the network, to extract a modular sub-network that achieves better O.O.D… 

Tables from this paper



Can Subnetwork Structure be the Key to Out-of-Distribution Generalization?

A functional modular probing method is used to analyze deep model structures under OOD setting and demonstrates that even in biased models (which focus on spurious correlation) there still exist unbiased functional subnetworks.

Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

A novel method based on learning binary weight masks to identify individual weights and subnets responsible for specific functions in NNs is presented, demonstrating how common NNs fail to reuse submodules and offering new insights into the related issue of systematic generalization on language tasks.

Recurrent Independent Mechanisms

Recurrent Independent Mechanisms is proposed, a new recurrent architecture in which multiple groups of recurrent cells operate with nearly independent transition dynamics, communicate only sparingly through the bottleneck of attention, and are only updated at time steps where they are most relevant.

Shortcut Learning in Deep Neural Networks

A set of recommendations for model interpretation and benchmarking is developed, highlighting recent advances in machine learning to improve robustness and transferability from the lab to real-world applications.

Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity

This work considers the problem of learning a sparse multi-task regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity, and proposes a structured regularization based on a group-lasso penalty.

Categorical Reparameterization with Gumbel-Softmax

It is shown that the Gumbel-Softmax estimator outperforms state-of-the-art gradient estimators on structured output prediction and unsupervised generative modeling tasks with categorical latent variables, and enables large speedups on semi-supervised classification.

Invariant Risk Minimization

This work introduces Invariant Risk Minimization, a learning paradigm to estimate invariant correlations across multiple training distributions and shows how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.

Recurrent Independent Mechanisms Tree - Guided Group Lasso for MultiTask Regression with Structured Sparsity Can Subnetwork Structure be the Key to Out - of - Distribution Generalization ? In ICML .