MICE: Multivariate Imputation by Chained Equations in R

@article{Buuren2011MICEMI,
  title={MICE: Multivariate Imputation by Chained Equations in R},
  author={Stef van Buuren and Karin G. M. Groothuis-Oudshoorn},
  journal={Journal of Statistical Software},
  year={2011},
  volume={45},
  pages={1-67}
}
The R package mice imputes incomplete multivariate data by chained equations. [] Key Method mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs. Imputation of categorical data is improved in order to bypass problems caused by perfect prediction.
Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable
TLDR
It is concluded that, when there is an interaction effect between a dummy and a continuous predictor, substantial gains are possible by using recursive partitioning for imputation compared to parametric methods, and also, the MICE-Interaction method is always more efficient and convenient to preserve interaction effects than the other methods.
Multiple imputation with multivariate imputation by chained equation (MICE) package.
TLDR
A step-by-step approach to perform MI by using R multivariate imputation by chained equation (MICE) package and the results obtained from each analysis are combined.
dynr.mi: An R Program for Multiple Imputation in Dynamic Modeling.
TLDR
Dynr.mi(), a function in the R package, Dynamic Modeling in R (dynr), is introduced to examine, in the context of a vector autoregressive model, the relationships among individuals' ambulatory physiological measures, and self-report affect valence and arousal in a user-specified dynamic systems model via MI.
Multiple imputation by chained equations in praxis: Guidelines and review
TLDR
These guidelines will enable a social science researcher to go through the process of handling missing data while adhering to the newest developments in the field, and incorporate recent innovations on how to handle missing data such as random forests and predictive mean matching.
Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study
TLDR
Compared parametric MICE with a random forest-based MICE algorithm, random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.
Multiple imputation by chained equations: what is it and how does it work?
TLDR
This paper provides an introduction to the MICE method with a focus on practical aspects and challenges in using this method.
Raoul: An R-Package for Handling Missing Data
TLDR
The Raoul package is computationally faster than its competitors, and its performance is roughly on par with these competitors for all types of missing data at the 10% and 20% level of missingness, but that it fails to compete at the 40% missingness level.
A Comparison of Methods for Creating Multiple Imputations of Nominal Variables
TLDR
A Monte Carlo simulation study compared the performance of the five imputation methods under conditions of varying sample size, percentage of missing data, and number of nominal response categories and found that MICE with polytomous regression was the strongest performer while the Allison (2002) ranking procedure and Mice with CART performed poorly in most conditions.
Aalborg Universitet Multiple imputation by chained equations in praxis Guidelines and
TLDR
These guidelines will enable a social science researcher to go through the process of handling missing data while adhering to the newest developments in the field, and incorporate recent innovations on how to handle missing data such as random forests and predictive mean matching.
MISSING DATA, IMPUTATION AND REGRESSION TREES
TLDR
A large simulation experiment is used to compare the parameter estimation bias of GUIDE and MICE and the prediction accuracy of several model-based and machine learning regression algorithms after GUIDEand MICE imputation.
...
...

References

SHOWING 1-10 OF 195 REFERENCES
Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box
TLDR
The mi package in R has features that allow the user to get inside the imputation process and evaluate the reasonableness of the resulting models and imputations, and uses Bayesian models and weakly informative prior distributions to construct more stable estimates of imputation models.
How should variable selection be performed with multiply imputed data?
TLDR
Most methods improve on the naïve complete‐case analysis for variable selection, but importantly the type 1 error is only preserved if selection is based on RR, which is the recommended approach.
A multivariate technique for multiply imputing missing values using a sequence of regression models
This article describes and evaluates a procedure for imputing missing values for a relatively complex data structure when the data are missing at random. The imputations are obtained by fitting a
Multiple imputation of discrete and continuous data by fully conditional specification
  • S. van Buuren
  • Computer Science
    Statistical methods in medical research
  • 2007
TLDR
FCS is a useful and easily applied flexible alternative to JM when no convenient and realistic joint distribution can be specified, and shows that FCS behaves very well in the cases studied.
Multiple imputation of discrete and continuous data by fully conditional specification
TLDR
FCS is a semi-parametric and flexible alternative that specifies the multivariate model by a series of conditional models, one for each incomplete variable, but its statistical properties are difficult to establish.
Evaluation of software for multiple imputation of semi-continuous data
TLDR
The findings of this study showed differences in the performance of the MI programs when imputing semi-continuous data, and caution should be exercised when deciding which program should perform MI on this type of data.
Diagnostics for multivariate imputations
TLDR
This work considers three sorts of diagnostics for random imputations: displays of the completed data, comparisons of the distributions of observed and imputed data values and checks of the fit of observed data to the model that is used to create the imputations.
Multiple Imputation by Chained Equations (MICE): Implementation in Stata
TLDR
Ice is described, an implementation in Stata of the MICE approach to multiple imputation, and real data from an observational study in ovarian cancer is used to illustrate the most important of the many options available with ice.
Multiple Imputation in Practice : Comparison of Software Packages for Regression Models With Missing Variables
TLDR
A number of software packages that implement multiple imputation, originally proposed by Rubin in a public use dataset setting, are described and evaluated, and the interface, features, and results are compared.
Multiple-Imputation Inferences with Uncongenial Sources of Input
TLDR
When it is desirable to conduct inferences under models for nonresponse other than the original imputation model, a possible alternative to recreating imputation models is to incorporate appropriate importance weights into the standard combining rules.
...
...