• Corpus ID: 309894

Robust Propensity Score Computation Method based on Machine Learning with Label-corrupted Data

  title={Robust Propensity Score Computation Method based on Machine Learning with Label-corrupted Data},
  author={Chen Wang and Suzhen Wang and Fuyan Shi and Zaixiang Wang},
In biostatistics, propensity score is a common approach to analyze the imbalance of covariate and process confounding covariates to eliminate differences between groups. While there are an abundant amount of methods to compute propensity score, a common issue of them is the corrupted labels in the dataset. For example, the data collected from the patients could contain samples that are treated mistakenly, and the computing methods could incorporate them as a misleading information. In this… 

Figures and Tables from this paper

Use of machine learning for comparing disease risk scores and propensity scores under complex confounding and large sample size scenarios: a simulation study

Under strong confounding with large sample size DRS reduced bias compared to PS in scenarios with low treatment prevalence, whilst PS was preferable for the study of treatments with prevalence greater than 10%, regardless of the outcome prevalence.

Bagging of Xgboost Classifiers with Random Under-sampling and Tomek Link for Noisy Label-imbalanced Data

A bagging-based algorithm with Xgboost classifier (Gradient Boosting Machine-based classifier with convenient parameter tuning interface) and under-sampling approaches to overcome the challenge of noisy imbalanced data classification is proposed.

Causal Inference for Survival Analysis

A real world healthcare dataset was used with about 1800 patients with breast cancer, which has multiple patient covariates as well as disease free survival days (DFS) and a death event binary indicator.

Deep Learning for Causal Inference

This paper proposes the use of deep neural networks (DNNs) for propensity score matching, and presents a network called PropensityNet, a generalization of the logistic regression technique traditionally used to estimate propensity scores.

Balance Regularized Neural Network Models for Causal Effect Estimation

This work is motivated by representation learning techniques to reduce differences between treated and untreated distributions that potentially arise due to confounding factors and regularize the model by encouraging it to predict control outcomes for individuals in the treatment group that are similar to control outcomes in the control group.

Predicting Acute Kidney Injury: A Machine Learning Approach Using Electronic Health Records

Machine learning techniques are employed to identify older patients who have a risk of readmission with AKI to the hospital or emergency department within 90 days after discharge, which provides healthcare providers enough time to intervene before the onset of AKI.

Sales Forecasting: Machine Learning Solution to B2B Sales Opportunity Win-Propensity Computation

Customised model stack involving Random Forests, GLM, boosting, trees, and neural networks is a proposed solution in this research for computation of a sales win-propensity score for B2B software sales.

Visual Analytics of Electronic Health Records with a focus on Acute Kidney Injury

This book aims to provide a forum for discussion and discussion of the role of emotion and self-consistency in decision-making in the rapidly changing environment.



Propensity score prediction for electronic healthcare databases using super learner and high-dimensional propensity score methods

Combining the SL with the hdPS was the most consistent prediction method and may be promising for PS estimation and prediction modeling in electronic healthcare databases.

Using classification tree analysis to generate propensity score weights

This work introduces classification tree analysis (CTA) to generate PS which is a "decision-tree"-like classification model that provides accurate, parsimonious decision rules that are easy to display and interpret, reports P values derived via permutation tests, and evaluates cross-generalizability.

Probability estimation with machine learning methods for dichotomous and multicategory outcome: Theory

The general validity of the machine learning methods is demonstrated and each method fails in at least one simulation scenario, and recommendations for selecting and tuning the methods are given.

Improving propensity score estimators' robustness to model misspecification using super learner.

The results suggest that use of SL to estimate the PS can improve covariate balance and reduce bias in a meaningful manner in cases of serious model misspecification for treatment assignment.

Learning with Noisy Labels

The problem of binary classification in the presence of random classification noise is theoretically studied—the learner sees labels that have independently been flipped with some small probability, and methods used in practice such as biased SVM and weighted logistic regression are provably noise-tolerant.

Estimating propensity scores with missing covariate data using general location mixture models

This work proposes a general location mixture model for imputations that assumes that the control units are a latent mixture of (i) units whose covariates are drawn from the same distributions as the treated units' covariates and (ii) unitswhose covariates is drawn from different distributions.

A tutorial on propensity score estimation for multiple treatments using generalized boosted models

This paper defines the causal quantities that may be of interest to studies of multiple treatments and derive weighted estimators of those quantities and proposes the use of generalized boosted models (GBM) for estimation of the necessary propensity score weights.

Learning from Noisy Labels with Distillation

This work proposes a unified distillation framework to use “side” information, including a small clean dataset and label relations in knowledge graph, to “hedge the risk” of learning from noisy labels, and proposes a suite of new benchmark datasets to evaluate this task in Sports, Species and Artifacts domains.

Evaluation of the Effect of a Continuous Treatment: A Machine Learning Approach with an Application to Treatment for Traumatic Brain Injury

This paper introduces a machine learning method, the ‘Super Learner’, to address model selection in this context, and finds that transfer time does not have a statistically significant marginal effect on the outcomes.

The central role of the propensity score in observational studies for causal effects

Abstract : The results of observational studies are often disputed because of nonrandom treatment assignment. For example, patients at greater risk may be overrepresented in some treatment group.