• Corpus ID: 49667628

Causal discovery in the presence of missing data

  title={Causal discovery in the presence of missing data},
  author={Ruibo Tu and Cheng Zhang and Paul W. Ackermann and Hedvig Kjellstr{\"o}m and Kun Zhang},
Missing data are ubiquitous in many domains such as healthcare. When these data entries are not missing completely at random, the (conditional) independence relations in the observed data may be di ... 

Figures from this paper

Causal Discovery in the Presence of Missing Values for Neuropathic Pain Diagnosis

The constraint-based causal discovery method PC is extended to handle binary data sets with missing values for the neuropathic pain diagnosis and identifies the potential errors of simply applying PC to data setsWith missing values.

MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models

MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework and is demonstrated the exibility of MissDAG for incorporating various causal discovery algorithms and its e cacy through extensive simulations and real data experiments.

Full Law Identification In Graphical Models Of Missing Data: Completeness Results

This paper provides the first completeness result in this field of study - necessary and sufficient graphical conditions under which, the full data distribution can be recovered from the observed data distribution.

A practical guide to causal discovery with cohort data

This guide presents how to perform constraint-based causal discovery using three popular software packages: pcalg, bnlearn, and TETRAD, and points out the relative strengths and limitations of each package, as well as give practical recommendations.

Greedy structure learning from data that contains systematic missing values

The empirical investigations show that the proposed approach outperforms the commonly used and state-of-the-art Structural EM algorithm, both in terms of learning accuracy and efficiency, as well as when data are missing at random and not at random.

Multiple imputation and test‐wise deletion for causal discovery with incomplete cohort data

This article establishes necessary and sufficient conditions for the recoverability of causal structures under test‐wise deletion, and argues that multiple imputation is more challenging in the context of causal discovery than for estimation.

MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms

This work develops a regularization scheme that encourages any baseline imputation method to be causally consistent with the underlying data generating mechanism, and proposes a causally-aware imputation algorithm, MIRACLE, that is able to consistently improve imputation over a variety of benchmark methods.

Star-causality and factor analysis: old stories and new perspectives

  • L. Xu
  • Business
    Applied Informatics
  • 2017
In this paper, studies on conditional independence-based causality are briefly reviewed along a line of observable two-variable, three- variable, star decomposable, and tree decomposables, as well as their relationship to factor analysis.

Causal discovery of gene regulation with incomplete data

This work applied causal discovery to obtain novel insights into the genetic regulation underlying head‐and‐neck squamous cell carcinoma, and proposed a new procedure combining constraint‐based causal discovery with multiple imputation based on using Rubin's rules for pooling tests of conditional independence.

On Testability and Goodness of Fit Tests in Missing Data Models

New insights are provided on the testable implications of three broad classes of missing data graphical models, and how to design goodness-of-fit tests around them.



Graphical Models for Inference with Missing Data

This work employs a formal representation called ‘Missingness Graphs’ to explicitly portray the causal mechanisms responsible for missingness and to encode dependencies between these mechanisms and the variables being measured.

Missing Data as a Causal and Probabilistic Problem

This paper extends the converse approach of [7] of representing missing data problems to causal models where only interventions onMissingness indicators are allowed to give a general criterion for cases where a joint distribution containing missing variables can be recovered from data actually observed, given assumptions on missingness mechanisms.

Identification In Missing Data Models Represented By Directed Acyclic Graphs

This paper proposes a new algorithm that significantly generalizes the types of manipulations used in the ID algorithm, developed in the context of causal inference, in order to obtain identification.

Handling hybrid and missing data in constraint-based causal discovery to study the etiology of ADHD

A new method is developed based on the assumption that data are missing at random and that continuous variables obey a non-paranormal distribution that helps in the understanding of the etiology of attention-deficit/hyperactivity disorder (ADHD).


Two results are presented concerning inference when data may be missing. First, ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the

Structure Learning Under Missing Data

This paper discusses adjustments that must be made to existing structure learning algorithms to properly account for missing data, and gives an algorithm for the simpler setting where the underlying graph is unknown, but the missing data model is known.

Estimation with Incomplete Data: The Linear Case

This work devise model-based methods to consistently estimate mean, variance and covariance given data that are Missing Not At Random (MNAR), and extends the analysis to continuous variables drawn from Gaussian distributions.

On the Testability of Models with Missing Data

This work uses the results to show that model sensitivity persists in almost all models typically categorized as MNAR, and provides sucient conditions to detect the existence of dependence between a variable and its missingness mechanism.

Graphical Models for Recovering Probabilistic and Causal Queries from Missing Data

It is shown that causal queries may be recoverable even when the factors in their identifying estimands are not recoverable, and applied to problems of attrition, the recovery of causal effects from data corrupted by attrition is characterized.

A Linear Non-Gaussian Acyclic Model for Causal Discovery

This work shows how to discover the complete causal structure of continuous-valued data, under the assumptions that (a) the data generating process is linear, (b) there are no unobserved confounders, and (c) disturbance variables have non-Gaussian distributions of non-zero variances.