# Full Law Identification In Graphical Models Of Missing Data: Completeness Results

@article{Nabi2020FullLI, title={Full Law Identification In Graphical Models Of Missing Data: Completeness Results}, author={Razieh Nabi and Rohit Bhattacharya and Ilya Shpitser}, journal={Proceedings of machine learning research}, year={2020}, volume={119}, pages={ 7153-7163 } }

Missing data has the potential to affect analyses conducted in all fields of scientific study including healthcare, economics, and the social sciences. Several approaches to unbiased inference in the presence of non-ignorable missingness rely on the specification of the target distribution and its missingness process as a probability distribution that factorizes with respect to a directed acyclic graph. In this paper, we address the longstanding question of the characterization of models that…

## 21 Citations

### Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data

- Computer Science, MathematicsNeurIPS
- 2020

An estimation of the loading coefficients and a data imputation method based on estimators of means, variances and covariances of missing variables, which prove identifiability of the PPCA parameters.

### A Robust Functional EM Algorithm for Incomplete Panel Count Data

- Computer ScienceNeurIPS
- 2020

A simple yet widely applicable functional EM algorithm to estimate the counting process mean function, which is of central interest to behavioral scientists and illustrates the utility of the proposed algorithm through numerical experiments and an analysis of smoking cessation data.

### The Importance of Modeling Data Missingness in Algorithmic Fairness: A Causal Perspective

- Computer ScienceAAAI
- 2021

This work shows conditions under which various distributions, used in popular fairness algorithms, can or can not be recovered from the training data, and uses causal graphs to characterize the missingness mechanisms in different real-world scenarios.

### Neumann networks: differential programming for supervised learning with missing values

- Computer ScienceNeurIPS
- 2020

This work derives the analytical form of the optimal predictor under a linearity assumption and various missing data mechanisms and proposes a new principled architecture, named Neumann networks, which have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.

### Leveraging Structured Biological Knowledge for Counterfactual Inference: A Case Study of Viral Pathogenesis

- Computer ScienceIEEE Transactions on Big Data
- 2021

This article proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question.

### Do-calculus enables causal reasoning with latent variable models

- Computer ScienceArXiv
- 2021

It is demonstrated that an LVM can answer any causal query posed post-training, provided that the query can be identified from the observed variables according to the do-calculus rules.

### Pattern graphs: A graphical approach to nonmonotone missing data

- Mathematics, Computer ScienceThe Annals of Statistics
- 2022

This work introduces the concept of pattern graphs--directed acyclic graphs representing how response patterns are associated and proposes three graph-based sensitivity analyses and studies the equivalence class of pattern graph.

### Optimal Training of Fair Predictive Models

- Computer ScienceCLeaR
- 2022

This work shows how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints.

### Causal and counterfactual views of missing data models

- Computer ScienceArXiv
- 2022

It is made explicit how the missing data problem of recovering the complete data law from the observed law can be viewed as identification of a joint distribution over counterfactual variables corresponding to values had the authors (possibly contrary to fact) been able to observe them.

### Identifying Counterfactual Queries with the R Package cfid

- Computer Science
- 2022

The R package cﬁd is presented that implements the ID* and IDC* algorithms, analogous to the ID andIDC algorithms by Shpitser and Pearl (2006b,a) for identiﬂcation of interventional distributions, which were implemented in R by Tikka and Karvanen (2017) in the causaleﬀect package.

## References

SHOWING 1-10 OF 49 REFERENCES

### Using causal diagrams to guide analysis in missing data problems

- Computer ScienceStatistical methods in medical research
- 2012

It is shown that using causal diagrams to represent additional assumptions regarding the mechanism giving rise to the missing data both complements and clarifies some of the central issues in missing data theory.

### Identification In Missing Data Models Represented By Directed Acyclic Graphs

- Computer Science, MathematicsUAI
- 2019

This paper proposes a new algorithm that significantly generalizes the types of manipulations used in the ID algorithm, developed in the context of causal inference, in order to obtain identification.

### Structure Learning Under Missing Data

- Computer SciencePGM
- 2018

This paper discusses adjustments that must be made to existing structure learning algorithms to properly account for missing data, and gives an algorithm for the simpler setting where the underlying graph is unknown, but the missing data model is known.

### Missing Data as a Causal and Probabilistic Problem

- Computer Science, MathematicsUAI
- 2015

This paper extends the converse approach of [7] of representing missing data problems to causal models where only interventions onMissingness indicators are allowed to give a general criterion for cases where a joint distribution containing missing variables can be recovered from data actually observed, given assumptions on missingness mechanisms.

### Running head : SELECTION OF AUXILIARY VARIABLES 1 Selection of auxiliary variables in missing data problems : Not all auxiliary variables are created equal

- Mathematics
- 2013

The treatment of missing data in the social sciences has changed tremendously during the last decade. Modern missing data techniques such as multiple imputation and full-information maximum…

### Consistent Estimation of Functions of Data Missing Non-Monotonically and Not at Random

- Mathematics, Computer ScienceNIPS
- 2016

Estimators are proposed, which are generalized inverse probability weighting estimators that permit identification from the observed data law, and admit a natural fitting procedure based on the pseudo likelihood approach of Besag, 1975.

### Graphical Models for Inference with Missing Data

- Computer Science, MathematicsNIPS
- 2013

This work employs a formal representation called 'Missingness Graphs' to explicitly portray the causal mechanisms responsible for missingness and to encode dependencies between these mechanisms and the variables being measured.

### Semiparametric Inference for Nonmonotone Missing-Not-at-Random Data: The No Self-Censoring Model

- MathematicsJournal of the American Statistical Association
- 2022

A practical augmented inverse probability weighted estimator is proposed, and in the setting with a (possibly high-dimensional) always-observed subset of covariates, the proposed estimator enjoys a certain double-robustness property.

### Discrete Choice Models for Nonmonotone Nonignorable Missing Data: Identification and Inference.

- MathematicsStatistica Sinica
- 2018

This paper proposes an all-purpose approach which delivers semiparametric inferences when missing data are nonmonotone and not at random and is based on a discrete choice model (DCM) as a means to generate a large class of non monotone nonresponse mechanisms that are nonignorable.

### Non-response models for the analysis of non-monotone ignorable missing data.

- MathematicsStatistics in medicine
- 1997

It is shown that there exists ignorable missing data processes that are not RMM, and it may be inappropriate to analyse non-monotone missing data under the assumption that the missingness mechanism is ignorable, if a statistical test has rejected the hypothesis that theMissing data process is RMM representable.