• Corpus ID: 173990271

An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference

  title={An Evaluation Toolkit to Guide Model Selection and Cohort Definition in Causal Inference},
  author={Yishai Shimoni and Ehud Karavani and Sivan Ravid and Peter Bak and Tan Hung Marie Ng and Sharon Hensley Alford and Denise Meade and Yaara Goldschmidt},
Real world observational data, together with causal inference, allow the estimation of causal effects when randomized controlled trials are not available. To be accepted into practice, such predictive models must be validated for the dataset at hand, and thus require a comprehensive evaluation toolkit, as introduced here. Since effect estimation cannot be evaluated directly, we turn to evaluating the various observable properties of causal inference, namely the observed outcome and treatment… 

Figures from this paper

Finding Valid Adjustments under Non-ignorability with Minimal DAG Knowledge
This work shows that under a different expert-driven structural knowledge — that one variable is a direct causal parent of the treatment variable — remarkably, testing for subsets (not involving the known parent variable) that are valid back-doors is equivalent to an invariance test.
Positivity Validation Detection and Explainability via Zero Fraction Multi-Hypothesis Testing and Asymmetrically Pruned Decision Trees
This paper presents the problem of automatic positivity analysis and proposes an algorithm based on a two steps process that models the propensity condition on the covariates and then analyzes the latter distribution using multiple hypothesis testing to create positivity violation labels.
High-Dimensional Feature Selection for Sample Efficient Treatment Effect Estimation
A common objective function involving outcomes across treatment cohorts with nonconvex joint sparsity regularization that is guaranteed to recover $S$ with high probability under a linear outcome model for $Y$ and subgaussian covariates for each of the treatment cohort is proposed.
RealCause: Realistic Causal Inference Benchmarking
Using flexible generative models, a benchmark that both yields ground-truth and is realistic is provided and 66 different causal estimators are evaluated.
High-Dimensional Feature Selection for Sample E cient Treatment E↵ect Estimation
A common objective function involving outcomes across treatment cohorts with nonconvex joint sparsity regularization that is guaranteed to recover S with high probability under a linear outcome model for Y and subgaussian covariates for each of the treatment cohort is proposed.
A causal inference approach for estimating effects of non-pharmaceutical interventions during Covid-19 pandemic
In response to the outbreak of the coronavirus disease 2019 (Covid-19), governments worldwide have introduced multiple restriction policies, known as non-pharmaceutical interventions (NPIs). However,
Survey on Causal-based Machine Learning Fairness Notions
This paper examines an exhaustive list of causal-based fairness notions, in particular their applicability in real-world scenarios and compiles the most relevant identifiability criteria for the problem of fairness from the extensive literature on identifiable theory.
High-Throughput Clinical Trial Emulation with Real World Data and Machine Learning: A Case Study of Drug Repurposing for Alzheimer's Disease
This paper emulates $430,000$ trials from two large-scale RWD warehouses, targeting new indications of approved drugs for Alzheimer's disease, and demonstrates that regularized logistic regression based propensity score (PS) model outperforms deep learning based PS model and others, which contradicts with the authors' intuitions to certain extent.
Predictive and Causal Analysis of No-Shows for Medical Exams During COVID-19: A Case Study of Breast Imaging in a Nationwide Israeli Health Organization
The results imply that a patient's perceived risk of cancer and the COVID-19 time-based factors are major predictors of no-shows, and it is revealed that closures impact patients over 60, but not patients undergoing advanced diagnostic examinations.
A commentary on “Emulated Clinical Trials from Longitudinal Real World Data Efficiently Identify Candidates for Neurological Disease Modification: Examples from Parkinson’s disease” and its Relevance to COVID-19 challenges
Once a large enough cohorts of CO VID-19 patients is assembled, the methods described in the paper including the procedures for assessing beneficial drugs for neurological conditions have the potential to find existing drugs that can help long term COVID-19 sufferers.


Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis
This work presents a comprehensive framework for benchmarking algorithms that estimate causal effect using data based on real-world covariates, and the treatment assignments and outcomes are based on simulations, which provides the basis for validation.
Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks
Empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions.
Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition
The causal inference data analysis challenge, "Is Your SATT Where It's At?", launched as part of the 2016 Atlantic Causal Inference Conference, sought to make progress with respect to both the data testing grounds and the researchers submitting methods whose efficacy would be evaluated.
Estimating individual treatment effect: generalization bounds and algorithms
A novel, simple and intuitive generalization-error bound is given showing that the expected ITE estimation error of a representation is bounded by a sum of the standard generalized-error of that representation and the distance between the treated and control distributions induced by the representation.
A Second Chance to Get Causal Inference Right: A Classification of Data Science Tasks
It is argued that a failure to adequately describe the role of subject-matter expert knowledge in data analysis is a source of widespread misunderstandings about data science and how to guide decision-making in the real world and to train data scientists.
Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data
The framework’s ability to develop reproducible models that can be readily shared and offers the potential to perform extensive external validation of models, and improve their likelihood of clinical uptake are illustrated.
Causal inference in statistics: An overview
This review presents empiricalresearcherswith recent advances in causal inference, and stresses the paradigmatic shifts that must be un- dertaken in moving from traditionalstatistical analysis to
Assessing Sensitivity to an Unobserved Binary Covariate in an Observational Study with Binary Outcome
This paper proposes a simple technique for assessing the range of plausible causal con- clusions from observational studies with a binary outcome and an observed categorical covariate. The technique
Matching methods for causal inference: A review and a look forward.
  • E. Stuart
  • Economics
    Statistical science : a review journal of the Institute of Mathematical Statistics
  • 2010
A structure for thinking about matching methods and guidance on their use is provided, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.
Estimating causal effects from epidemiological data
This article reviews a condition that permits the estimation of causal effects from observational data, and two methods—standardisation and inverse probability weighting—to estimate population causal effects under that condition.