Score matching enables causal discovery of nonlinear additive noise models

  title={Score matching enables causal discovery of nonlinear additive noise models},
  author={Paul Rolland and Volkan Cevher and Matth{\"a}us Kleindessner and Chris Russel and Bernhard Scholkopf and Dominik Janzing and Francesco Locatello},
This paper demonstrates how to recover causal graphs from the score of the data distribution in nonlinear additive (Gaussian) noise models. Using score matching algorithms as a building block, we show how to design a new generation of scalable causal discovery methods. To showcase our approach, we also propose a new efficient method for approximating the score’s Jacobian, enabling to recover the causal graph. Empirically, we find that the new algorithm, called SCORE, is competitive with state-of… 

Tables from this paper

Independence Testing-Based Approach to Causal Discovery under Measurement Error and Linear Non-Gaussian Models

This work proposes the Transformed Independent Noise ( TIN) condition, which checks for independence between a specific linear transformation of some measured variables and certain other measured variables, and is informative about the graph structure among the unobserved target variables.

Diffusion Models for Causal Discovery via Topological Ordering

The DiffAN 1 algorithm, inspired by recent innovations in diffusion probabilistic models (DPMs), is proposed, a topological ordering algorithm that leverages DPMs and introduces theory for updating the learned Hessian without re-training the neural network.

Nonlinear Causal Discovery via Kernel Anchor Regression

This work tackles the nonlinear setting by proposing kernel anchor regression (KAR), and beyond the natural formula-tion using a classic two-stage least square estimator, also study an improved variant that involves nonparametric regression in three separate stages.

Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders

An algorithm is provided that learns the causal model parameters by pooling data from different regimes and jointly maximizing the combined likelihood under non-linear continuous structural causal models with additive, multivariate Gaussian noise—even when unobserved confounders are present.

Inference for a Large Directed Acyclic Graph with Unspecified Interventions

Inference of directed relations given some unspecified interventions, that is, the target of each intervention is unknown, is challenging. In this article, we test hypothesized directed relations with



Gradient-Based Neural DAG Learning

A novel score-based approach to learning a directed acyclic graph (DAG) from observational data that outperforms current continuous methods on most tasks, while being competitive with existing greedy search methods on important metrics for causal inference.

CAM: Causal Additive Models, high-dimensional order search and penalized regression

This work substantially simplify the problem of structure search and estimation for an important class of causal models by establishing consistency of the (restricted) maximum likelihood estimator for low- and high-dimensional scenarios, and allowing for misspecification of the error distribution.

DAGs with NO TEARS: Continuous Optimization for Structure Learning

This paper forms the structure learning problem as a purely continuous optimization problem over real matrices that avoids this combinatorial constraint entirely and achieves a novel characterization of acyclicity that is not only smooth but also exact.

Causation, Prediction, and Search

Although Testing Statistical Hypotheses of Equivalence has some weaknesses, it is a useful reference for those interested in the question of equivalence testing, particularly in biological applications.

Optimal Structure Identification With Greedy Search

This paper proves the so-called "Meek Conjecture", which shows that if a DAG H is an independence map of another DAG G, then there exists a finite sequence of edge additions and covered edge reversals in G such that H remains anindependence map of G and after all modifications G =H.

Score-Based Generative Classifiers

The tremendous success of generative models in recent years raises the question whether they can also be used to perform classification. Generative models have been used as adversarially robust

Ordering-Based Causal Discovery with Reinforcement Learning

This work forms the ordering search problem as a multi-step Markov decision process, implements the ordering generating process with an encoder-decoder architecture, and uses RL to optimize the proposed model based on the reward mechanisms designed for each ordering.

Hessian Estimation via Stein's Identity in Black-Box Problems

This work establishes a novel Hessian approximation scheme and compares it with second-order simultaneous perturbation stochastic approximation (2SPSA) algorithm (Spall, 2000), which requires four ZO queries, while the authors' requires three instead.

Beware of the Simulated DAG! Varsortability in Additive Noise Models

This work introduces varsortability as a measure of agreement between the ordering by marginal variance and the causal order in additive noise models and shows how it dominates the performance of continuous structure learning algorithms on synthetic data.