# On causal and anticausal learning

@inproceedings{Schlkopf2012OnCA, title={On causal and anticausal learning}, author={B. Sch{\"o}lkopf and D. Janzing and J. Peters and Eleni Sgouritsa and Kun Zhang and J. Mooij}, booktitle={ICML}, year={2012} }

We consider the problem of function estimation in the case where an underlying causal model can be inferred. This has implications for popular scenarios such as covariate shift, concept drift, transfer learning and semi-supervised learning. We argue that causal knowledge may facilitate some approaches for a given problem, and rule out others. In particular, we formulate a hypothesis for when semi-supervised learning can help, and corroborate it with empirical results.

#### Supplemental Code

#### 269 Citations

Semi-supervised Learning in Causal and Anticausal Settings

- Computer Science
- Empirical Inference
- 2013

The hypothesis that semi-supervised learning can help in an anti-causal setting, but not in a causal setting is formulated, and empirical results corroborate it with empirical results. Expand

Causal Transfer Learning

- Computer Science, Mathematics
- ArXiv
- 2017

This work considers a class of causal transfer learning problems, where multiple training sets are given that correspond to different external interventions, and the task is to predict the distribution of a target variable given measurements of other variables for a new (yet unseen) intervention on the system. Expand

Semi-supervised learning, causality, and the conditional cluster assumption

- Mathematics, Computer Science
- UAI
- 2020

It is argued that in the more general setting, semi-supervised learning should use information in the conditional distribution of effect features given causal features, and how this insight generalises the previous understanding and can be exploited algorithmically for SSL. Expand

Justifying Information-Geometric Causal Inference

- Mathematics
- 2015

Information-Geometric Causal Inference (IGCI) is a new approach to distinguish between cause and effect for two variables. It is based on an independence assumption between input distribution and… Expand

Causal Inference on Discrete Data via Estimating Distance Correlations

- Mathematics, Computer Science
- Neural Computation
- 2016

This article proposes to infer the causal direction by comparing the distance correlation between and with the distance correlations between and, and infer that X causes Y if the dependence coefficient between and is smaller. Expand

Learning Causal Structures Using Regression Invariance

- Computer Science, Mathematics
- NIPS
- 2017

A notion of completeness for a causal inference algorithm in this setting is defined and an alternate algorithm is presented that has significantly improved computational and sample complexity compared to the baseline algorithm. Expand

Error asymmetry in causal and anticausal regression

- Computer Science, Mathematics
- ArXiv
- 2016

It is formed the theorem that the expected error of the true data generating function as prediction model is generally smaller when the effect is predicted from its cause and, on the contrary, greater when the cause is predictedFrom its effect. Expand

A Semi-supervised Approach to Discover Bivariate Causality in Large Biological Data

- Computer Science
- MLDM
- 2018

This work addresses the problem of causal inference in a bivariate case, where the joint distribution of two variables is observed, and addresses the state-of-the-art causality inference methods for continuous data. Expand

Learning Representations for Counterfactual Inference

- Computer Science, Mathematics
- ICML
- 2016

A new algorithmic framework for counterfactual inference is proposed which brings together ideas from domain adaptation and representation learning and significantly outperforms the previous state-of-the-art approaches. Expand

Causal discovery with continuous additive noise models

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2014

If the observational distribution follows a structural equation model with an additive noise structure, the directed acyclic graph becomes identifiable from the distribution under mild conditions, which constitutes an interesting alternative to traditional methods that assume faithfulness and identify only the Markov equivalence class of the graph, thus leaving some edges undirected. Expand

#### References

SHOWING 1-10 OF 20 REFERENCES

Robust Learning via Cause-Effect Models

- Computer Science, Mathematics
- ArXiv
- 2011

It is argued that knowledge of an underlying causal direction can facilitate several of these tasks such as covariate shift, concept drift, transfer learning and semi-supervised learning, which could be tackled depending on the kind of changes of the distributions. Expand

Regression by dependence minimization and its application to causal inference in additive noise models

- Mathematics, Computer Science
- ICML '09
- 2009

This work proposes a novel method for regression that minimizes the statistical dependence between regressors and residuals, and proposes an algorithm for efficiently inferring causal models from observational data for more than two variables. Expand

Inferring deterministic causal relations

- Computer Science, Mathematics
- UAI
- 2010

This paper considers two variables that are related to each other by an invertible function, and shows that even in the deterministic (noise-free) case, there are asymmetries that can be exploited for causal inference. Expand

Controlling Selection Bias in Causal Inference

- Psychology, Computer Science
- AISTATS
- 2012

This paper highlights several graphical and algebraic methods capable of mitigating and sometimes eliminating selection bias, and generalize and improve previously reported results, and identify the type of knowledge that need to be available for reasoning in the presence of selection bias. Expand

Nonlinear causal discovery with additive noise models

- Computer Science, Mathematics
- NIPS
- 2008

It is shown that the basic linear framework can be generalized to nonlinear models and, in this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-Generating mechanisms to be identified. Expand

When Training and Test Sets are Different: Characterising Learning Transfer

- Computer Science
- 2013

There is potential for the development of models which capture the specific types of variations, combine different modes of variation, or do model selection to assess whether dataset shift is an issue in particular circumstances. Expand

When Training and Test Sets Are Different: Characterizing Learning Transfer

- Computer Science
- 2009

This chapter contains sections titled: Introduction, Conditional and Generative Models, real- life Reasons for Dataset Shift, Real-Life Reasons for dataset shift, Simple Covariate Shift, Prior Probability Shift, Sample Selection Bias, Imbalanced Data, Domain Shift, Source Component Shift and Gaussian Process Methods. Expand

Causal Inference Using the Algorithmic Markov Condition

- Mathematics, Computer Science
- IEEE Transactions on Information Theory
- 2010

This work explains why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. Expand

Causal Models as Minimal Descriptions of Multivariate Systems

- 2006

By applying the minimality principle for model selection, one should seek the model that describes the data by a code of minimal length. Learning is viewed as data compression that exploits the… Expand

On the Identifiability of the Post-Nonlinear Causal Model

- Computer Science, Mathematics
- UAI
- 2009

It is shown that this post-nonlinear causal model is identifiable in most cases; by enumerating all possible situations in which the model is not identifiable, this model is identified by sufficient conditions for its identifiability. Expand