• Corpus ID: 235313807

Multiple Imputation Through XGBoost

  title={Multiple Imputation Through XGBoost},
  author={Yong Deng and Thomas Lumley},
Multiple imputation is increasingly used in tackling missing data. While some conventional multiple imputation approaches are well studied and have shown empirical validity, they entail limitations in processing large datasets with complex data structures. Their imputation performances usually rely on proper specifications of imputation models, which require expert knowledge of the inherent relations among variables. In addition, these standard approaches tend to be computationally ineffi-cient… 

Figures and Tables from this paper



mixgb: Multiple Imputation Through ’XGBoost’, r package version 0.1.1

  • 2022

MissRanger: Fast Imputation of Missing Values, R package version 2.1.0

  • 2019

MICE: Multivariate Imputation by Chained Equations in R

Mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs.

Improving the Efficiency of Relative-Risk Estimation in Case-Cohort Studies

A class of weighted estimators with general time-varying weights that are related to a class of estimators proposed by Robins, Rotnitzky, and Zhao are developed and shown to be consistent and asymptotically normal under appropriate conditions.

Searching for exotic particles in high-energy physics with deep learning.

It is shown that deep-learning methods need no manually constructed inputs and yet improve the classification metric by as much as 8% over the best current approaches, demonstrating that deep learning approaches can improve the power of collider searches for exotic particles.

A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria

  • 2022

The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning

This work proposes an accurate, fast, and scalable approach to multiple imputation, called MIDAS (Multiple Imputation with Denoising Autoencoders), which employs a class of unsupervised neural networks known as denoising autoen coders, designed to reduce dimensionality by corrupting and attempting to reconstruct a subset of data.

Community-based progress indicators for prevention of mother-to-child transmission and mortality rates in HIV-exposed children in rural Mozambique

Community-based subnational assessments of progress towards EMTCT are needed to complement clinic-based and modeling estimates, and SPECTRUM modeling estimated 15% MTCT, higher than district-level community-based estimates of MTCT among HIV-exposed children.

Sex-stratified gene-by-environment genome-wide interaction study of trauma, posttraumatic-stress, and suicidality