# A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

@article{Bengio2020AMO, title={A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms}, author={Yoshua Bengio and Tristan Deleu and Nasim Rahaman and Nan Rosemary Ke and S{\'e}bastien Lachapelle and Olexa Bilaniuk and Anirudh Goyal and Christopher Joseph Pal}, journal={ArXiv}, year={2020}, volume={abs/1901.10912} }

We propose to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional changes, e.g. due to interventions, actions of agents and other sources of non-stationarities. We show that under this assumption, the correct causal structural choices lead to faster adaptation to modified distributions because the changes are concentrated in one or just a few mechanisms when the learned knowledge is modularized appropriately. This leads to…

## 204 Citations

Disentangling causal effects for hierarchical reinforcement learning

- Computer ScienceArXiv
- 2020

Experimental results show that random effect exploration is a more efficient mechanism and that by assigning credit to few effects rather than many actions, CEHRL learns tasks more rapidly.

A Meta Learning Approach to Discerning Causal Graph Structure

- Computer ScienceArXiv
- 2021

The usage of meta-learning to derive the causal direction between variables by extending previous work in Bengio et al. (2019) as well as a simple extension into using rotational encoder-decoder structure in representation learning to demonstrate improved performance in learning ability.

On the Generalization and Adaption Performance of Causal Models

- Computer ScienceArXiv
- 2022

This work systematically study the generalization and adaption performance of such modular neural causal models by comparing it to monolithic models and structured models where the set of predictors is not constrained to causal parents.

An Analysis of the Adaptation Speed of Causal Models

- Computer ScienceAISTATS
- 2021

This work uses convergence rates from stochastic optimization to justify that a relevant proxy for adaptation speed is distance in parameter space after intervention, and shows that the SCM with the correct causal direction is advantaged for categorical and normal cause-effect datasets when the intervention is on the cause variable.

Disentangling Controlled Effects for Hierarchical Reinforcement Learning

- Computer ScienceCLeaR
- 2022

This work introduces CEHRL1, a hierarchical method leveraging the compositional nature of controlled effects to expedite the learning of task-specific behavior and aid exploration, and shows that using effects instead of actions provides a more efficient exploration mechanism.

Efficiently Disentangle Causal Representations

- Computer ScienceArXiv
- 2022

An efﬁcient approach to learning disentangled representations with causal mechanisms based on the difference of conditional probabilities in original and new distributions that requires evaluating the model’s generalization ability is proposed.

Disentangled Generative Causal Representation Learning

- Computer ScienceArXiv
- 2020

The key ingredient of this new formulation is to use a structural causal model (SCM) as the prior for a bidirectional generative model and the prior is trained jointly with a generator and an encoder using a suitable GAN loss.

Variational Causal Dynamics: Discovering Modular World Models from Interventions

- Computer ScienceArXiv
- 2022

Variational causal dynamics (VCD) is presented, a structured world model that exploits the invariance of causal mechanisms across environments to achieve fast and modular adaptation and is able to identify reusable components across different environments by causally factorising a transition model.

Robustifying Sequential Neural Processes

- Computer ScienceICML
- 2020

This paper proposes a new attention mechanism, Recurrent Memory Reconstruction (RMR), and demonstrates that providing an imaginary context that is recurrently updated and reconstructed with interaction is crucial in achieving effective attention for meta-transfer learning.

Prequential MDL for Causal Structure Learning with Neural Networks

- Computer ScienceArXiv
- 2021

It is shown that the prequential minimum description length principle can be used to derive a practical scoring function for Bayesian networks when flexible and overparametrized neural networks are used to model the conditional probability distributions between observed variables.

## References

SHOWING 1-10 OF 49 REFERENCES

Learning Independent Causal Mechanisms

- Computer ScienceICML
- 2018

This work develops an algorithm to recover a set of independent (inverse) mechanisms from a sets of transformed data points, based on aset of experts that compete for data generated by the mechanisms, driving specialization.

Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions

- Computer ScienceNeurIPS
- 2018

This work proposes an approach for solving causal domain adaptation problems that exploits causal inference and does not rely on prior knowledge of the causal graph, the type of interventions or the intervention targets, and demonstrates a possible implementation on simulated and real world data.

Invariant Models for Causal Transfer Learning

- Computer ScienceJ. Mach. Learn. Res.
- 2018

This work relaxes the usual covariate shift assumption and assumes that it holds true for a subset of predictor variables: the conditional distribution of the target variable given this subset of predictors is invariant over all tasks.

Causal Reasoning from Meta-reinforcement Learning

- Computer ScienceArXiv
- 2019

It is suggested that causal reasoning in complex settings may benefit from the more end-to-end learning-based approaches presented here, and this work offers new strategies for structured exploration in reinforcement learning, by providing agents with the ability to perform -- and interpret -- experiments.

Learning Neural Causal Models from Unknown Interventions

- Computer ScienceArXiv
- 2019

This paper provides a general framework based on continuous optimization and neural networks to create models for the combination of observational and interventional data and establishes strong benchmark results on several structure learning tasks.

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

- Computer ScienceICML
- 2019

This paper theoretically shows that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data, and trains more than 12000 models covering most prominent methods and evaluation metrics on seven different data sets.

Causal inference by using invariant prediction: identification and confidence intervals

- Computer Science, Mathematics
- 2015

This work proposes to exploit invariance of a prediction under a causal model for causal inference: given different experimental settings (e.g. various interventions) the authors collect all models that do show invariance in their predictive accuracy across settings and interventions, and yields valid confidence intervals for the causal relationships in quite general scenarios.

Causal feature learning: an overview

- Computer Science
- 2017

A detailed introduction to the causal inference framework is presented, laying out the definitions and algorithmic steps, and a simple example illustrates the techniques involved in the learning steps and provides visual intuition.

Learning to Learn with Gradients

- Computer Science
- 2018

This thesis discusses gradient-based algorithms for learning to learn, or meta-learning, which aim to endow machines with flexibility akin to that of humans, and shows how these methods can be extended for applications in motor control by combining elements of meta- learning with techniques for deep model-based reinforcement learning, imitation learning, and inverse reinforcement learning.

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

- Computer ScienceICML
- 2017

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning…