• Corpus ID: 7104840

Multiple imputation by chained equations in praxis: Guidelines and review

@article{Wulff2017MultipleIB,
  title={Multiple imputation by chained equations in praxis: Guidelines and review},
  author={Jesper N. Wulff and Linda Ejlskov Jeppesen},
  journal={The Electronic Journal of Business Research Methods},
  year={2017},
  volume={15},
  pages={41-56}
}
Multiple imputation by chained equations (MICE) is an effective tool to handle missing data an almost unavoidable problem in quantitative data analysis. However, despite the empirical and theoretical evidence supporting the use of MICE, researchers in the social sciences often resort to inferior approaches unnecessarily risking erroneous results. The complexity of the decision process when encountering missing data may be what is discouraging potential users from adopting the appropriate… 

Figures and Tables from this paper

Addressing missing values in routine health information system data: an evaluation of imputation methods using data from the Democratic Republic of the Congo during the COVID-19 pandemic

Seven commonly used imputation methods are implemented and recommended for the use of multiple imputation in addressing missing values in RHIS datasets and appropriate handling of data structure to minimize imputation standard errors.

Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values

This paper proposes an integrated approach based on decision trees that does not require a separate process of imputation and learning, and trains a tree with missing incorporated as attribute (MIA), whichdoes not require explicit imputation, and optimize a fairness-regularized objective function.

A multiple imputation approach to evaluate the accuracy of diagnostic tests in presence of missing values

The article aims to use a multiple imputation approach to evaluate binary diagnostic tests with missing data under the MCAR mechanism and the proposed approach is applied to a real data set.

biokNN: A bi-objective imputation method for multilevel data in R

The biokNN package provides functions to produce single and multiple imputation for data with continuous variables along with visualization tools to analyze the structure of the missing values among classes and variables.

Causal Inference via Nonlinear Variable Decorrelation for Healthcare Applications

A novel method with a variable decorrelation regularizer to handle both linear and nonlinear confounding and employs association rules as new representations using association rule mining based on the original features to further proximate human decision patterns to increase model interpretability.

Filling the gaps: imputation of missing metrics' values in a software quality model

Empirically validate a few imputation methods in context of a custom Géant-QM framework, used for evaluation of several open source systems, and results indicate imputing a missing value based on its close neighbors as data donors introduces less noise that using a wider set of donors.

Machine Learning in the Analysis of Social Problems: The Case of Global Human Trafficking

Results show that MICE had a level of effectiveness in handling missing data, while agglomerative hierarchical clustering was successful in identifying distinct and describable clusters from three time periods that the imputed dataset was segmented.

Tuberculosis treatment outcomes of notified cases: trends and determinants of potential unfavourable outcome, France, 2008 to 2014

Monitoring of treatment outcome is improving over time, but treatment outcome monitoring needs to be strengthened in cases belonging to population groups where the percentage of unfavourable outcome is the highest and in cases where surveillance data shows poorer documented follow-up.

Risk factors for increased COVID-19 case-fatality in the United States: A county-level analysis during the first wave

County-level variables associated with the COVID-19 case-fatality rate (CFR) using publicly available datasets and a negative binomial generalized linear model are identified to help officials target public health interventions and healthcare resources to locations that are at increased risk of CO VID-19 fatalities.

Risk factors for increased COVID-19 case-fatality in the United States: A county-level analysis during the first wave

County-level variables associated with the COVID-19 case-fatality rate (CFR) using publicly available datasets and a negative binomial generalized linear model are identified to help officials target public health interventions and healthcare resources to locations that are at increased risk of CO VID-19 fatalities.
...

References

SHOWING 1-10 OF 66 REFERENCES

Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation

This work adapts an algorithm and uses it to implement a general-purpose, multiple imputation model for missing data that is considerably faster and easier to use than the leading method recommended in the statistics literature.

Multiple imputation by chained equations: what is it and how does it work?

This paper provides an introduction to the MICE method with a focus on practical aspects and challenges in using this method.

Multiple imputation for missing data via sequential regression trees.

The authors present a nonparametric approach for implementing multiple imputation via chained equations by using sequential regression trees as the conditional models and demonstrate that the method can result in more plausible imputations, and hence more reliable inferences, in complex settings than the naive application of standard sequential regression imputation techniques.

Multiple imputation using chained equations: Issues and guidance for practice

The principles of the method and how to impute categorical and quantitative variables, including skewed variables, are described and shown and the practical analysis of multiply imputed data is described, including model building and model checking.

A critical look at methods for handling missing covariates in epidemiologic regression analyses.

The authors recommend that epidemiologists avoid using the missing-indicator method and use more sophisticated methods whenever a large proportion of data are missing, and contrast the results of multiple imputation to simple methods in the analysis of a case-control study of endometrial cancer.

Multiple Imputation for Missing Data: Making the most of What you Know

This article presents a simulation and data analysis case study using a method for dealing with missing data, multiple imputation, that allows for valid statistical inference with complete case statistical analysis.

Multiple imputation: current perspectives

An overview of multiple imputation and current perspectives on its use in medical research, showing how the use of so-called uncongenial imputation models are particularly valuable for sensitivity analyses and also for certain analyses in clinical trial settings.

Performance of Sequential Imputation Method in Multilevel Applications

In most realistic applications, the simulations suggest that the sequential method lead to well-calibrated estimates and in some settings the performance is even better than the more conventional methods with well-defined joint model in some scenarios.

Multiple imputation in the presence of high-dimensional data

Numerical studies show that in the presence of high-dimensional data the standard multiple imputations approach performs poorly and the imputation approach using Bayesian lasso regression achieves, in most cases, better performance than the other imputation methods including the standard imputations using the correctly specified imputation model.

Multiple Imputation After 18+ Years

A description of the assumed context and objectives of multiple imputation is provided, and a review of the multiple imputations framework and its standard results are reviewed.
...