Multiple imputation by chained equations in praxis: Guidelines and review
@article{Wulff2017MultipleIB, title={Multiple imputation by chained equations in praxis: Guidelines and review}, author={Jesper N. Wulff and Linda Ejlskov Jeppesen}, journal={The Electronic Journal of Business Research Methods}, year={2017}, volume={15}, pages={41-56} }
Multiple imputation by chained equations (MICE) is an effective tool to handle missing data an almost unavoidable problem in quantitative data analysis. However, despite the empirical and theoretical evidence supporting the use of MICE, researchers in the social sciences often resort to inferior approaches unnecessarily risking erroneous results. The complexity of the decision process when encountering missing data may be what is discouraging potential users from adopting the appropriate…
62 Citations
Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic
- MedicinePopulation Health Metrics
- 2021
Seven commonly used imputation methods are implemented and recommended for the use of multiple imputation in addressing missing values in RHIS datasets and appropriate handling of data structure to minimize imputation standard errors.
Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values
- Computer ScienceAAAI
- 2022
This paper proposes an integrated approach based on decision trees that does not require a separate process of imputation and learning, and trains a tree with missing incorporated as attribute (MIA), whichdoes not require explicit imputation, and optimize a fairness-regularized objective function.
A multiple imputation approach to evaluate the accuracy of diagnostic tests in presence of missing values
- Computer ScienceCommunications in Mathematical Biology and Neuroscience
- 2022
The article aims to use a multiple imputation approach to evaluate binary diagnostic tests with missing data under the MCAR mechanism and the proposed approach is applied to a real data set.
biokNN: A bi-objective imputation method for multilevel data in R
- Computer Science
- 2021
The biokNN package provides functions to produce single and multiple imputation for data with continuous variables along with visualization tools to analyze the structure of the missing values among classes and variables.
Causal Inference via Nonlinear Variable Decorrelation for Healthcare Applications
- Computer ScienceArXiv
- 2022
A novel method with a variable decorrelation regularizer to handle both linear and nonlinear confounding and employs association rules as new representations using association rule mining based on the original features to further proximate human decision patterns to increase model interpretability.
Filling the gaps: imputation of missing metrics' values in a software quality model
- Computer ScienceIWSM-Mensura
- 2017
Empirically validate a few imputation methods in context of a custom Géant-QM framework, used for evaluation of several open source systems, and results indicate imputing a missing value based on its close neighbors as data donors introduces less noise that using a wider set of donors.
Machine Learning in the Analysis of Social Problems: The Case of Global Human Trafficking
- Computer Science
- 2019
Results show that MICE had a level of effectiveness in handling missing data, while agglomerative hierarchical clustering was successful in identifying distinct and describable clusters from three time periods that the imputed dataset was segmented.
Tuberculosis treatment outcomes of notified cases: trends and determinants of potential unfavourable outcome, France, 2008 to 2014
- MedicineEuro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin
- 2020
Monitoring of treatment outcome is improving over time, but treatment outcome monitoring needs to be strengthened in cases belonging to population groups where the percentage of unfavourable outcome is the highest and in cases where surveillance data shows poorer documented follow-up.
Risk factors for increased COVID-19 case-fatality in the United States: A county-level analysis during the first wave
- MedicinemedRxiv
- 2021
County-level variables associated with the COVID-19 case-fatality rate (CFR) using publicly available datasets and a negative binomial generalized linear model are identified to help officials target public health interventions and healthcare resources to locations that are at increased risk of CO VID-19 fatalities.
Risk factors for increased COVID-19 case-fatality in the United States: A county-level analysis during the first wave
- MedicinePloS one
- 2021
County-level variables associated with the COVID-19 case-fatality rate (CFR) using publicly available datasets and a negative binomial generalized linear model are identified to help officials target public health interventions and healthcare resources to locations that are at increased risk of CO VID-19 fatalities.
References
SHOWING 1-10 OF 66 REFERENCES
Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation
- Computer ScienceAmerican Political Science Review
- 2001
This work adapts an algorithm and uses it to implement a general-purpose, multiple imputation model for missing data that is considerably faster and easier to use than the leading method recommended in the statistics literature.
Multiple imputation by chained equations: what is it and how does it work?
- PsychologyInternational journal of methods in psychiatric research
- 2011
This paper provides an introduction to the MICE method with a focus on practical aspects and challenges in using this method.
Multiple imputation for missing data via sequential regression trees.
- Computer ScienceAmerican journal of epidemiology
- 2010
The authors present a nonparametric approach for implementing multiple imputation via chained equations by using sequential regression trees as the conditional models and demonstrate that the method can result in more plausible imputations, and hence more reliable inferences, in complex settings than the naive application of standard sequential regression imputation techniques.
Multiple imputation using chained equations: Issues and guidance for practice
- MathematicsStatistics in medicine
- 2011
The principles of the method and how to impute categorical and quantitative variables, including skewed variables, are described and shown and the practical analysis of multiply imputed data is described, including model building and model checking.
A critical look at methods for handling missing covariates in epidemiologic regression analyses.
- BiologyAmerican journal of epidemiology
- 1995
The authors recommend that epidemiologists avoid using the missing-indicator method and use more sophisticated methods whenever a large proportion of data are missing, and contrast the results of multiple imputation to simple methods in the analysis of a case-control study of endometrial cancer.
Multiple Imputation for Missing Data: Making the most of What you Know
- Business
- 2003
This article presents a simulation and data analysis case study using a method for dealing with missing data, multiple imputation, that allows for valid statistical inference with complete case statistical analysis.
Multiple imputation: current perspectives
- MathematicsStatistical methods in medical research
- 2007
An overview of multiple imputation and current perspectives on its use in medical research, showing how the use of so-called uncongenial imputation models are particularly valuable for sensitivity analyses and also for certain analyses in clinical trial settings.
Performance of Sequential Imputation Method in Multilevel Applications
- Computer Science
- 2009
In most realistic applications, the simulations suggest that the sequential method lead to well-calibrated estimates and in some settings the performance is even better than the more conventional methods with well-defined joint model in some scenarios.
Multiple imputation in the presence of high-dimensional data
- Computer ScienceStatistical methods in medical research
- 2016
Numerical studies show that in the presence of high-dimensional data the standard multiple imputations approach performs poorly and the imputation approach using Bayesian lasso regression achieves, in most cases, better performance than the other imputation methods including the standard imputations using the correctly specified imputation model.
Multiple Imputation After 18+ Years
- Computer Science
- 1996
A description of the assumed context and objectives of multiple imputation is provided, and a review of the multiple imputations framework and its standard results are reviewed.