Amelia II: A Program for Missing Data

  title={Amelia II: A Program for Missing Data},
  author={James Honaker and Gary King and Matthew Blackwell},
  journal={Journal of Statistical Software},
Amelia II is a complete R package for multiple imputation of missing data. The package implements a new expectation-maximization with bootstrapping algorithm that works faster, with larger numbers of variables, and is far easier to use, than various Markov chain Monte Carlo approaches, but gives essentially the same answers. The program also improves imputation models by allowing researchers to put Bayesian priors on individual cell values, thereby including a great deal of potentially valuable… 
imputeMulti: Imputation for Multivariate Multinomial Missing Data
The imputeMulti package’s functionality is intro-duced and a hands-on approach to solving multinomial missing data problems is provided and the package capably handles large datasets.
gcimpute: A Package for Missing Data Imputation
The gcimpute package can impute missing data with many different variable types, including continuous, binary, ordinal, count, and truncated values, by modeling data as samples from a Gaussian copula model that enables fast inference, imputation with confidence intervals, and multiple imputation.
State of the Multiple Imputation Software.
  • R. Yucel
  • Computer Science
    Journal of statistical software
  • 2011
A brief history of multiple imputation and relevant software and the contents of the contributions are provided and potential directions for the future of the software development are provided.
EM-based stepwise regression imputation using standard and robust methods
The aim of this contribution is to propose an automatic algorithm called IRMI for iterative model-based imputation using robust methods, and to provide a software tool in R for this algorithm to be compared to the algorithm IVEWARE.
Deletion Practices in the Era of Permanent Digital Memory
Amelia II performs multiple imputation, a general-purpose approach to data with missing values, which creates multiple “filled in” or rectangularized versions of the incomplete data set so that analyses which require complete observations can appropriately use all the information present in a data set with missingness.
A large simulation experiment is used to compare the parameter estimation bias of GUIDE and MICE and the prediction accuracy of several model-based and machine learning regression algorithms after GUIDEand MICE imputation.
Multiple imputation for time series data with Amelia package.
The article illustrates how to perform MI by using Amelia package in a clinical scenario by examining the distributions of imputed and observed values, or by using over-imputation technique.
Interpreting Zelig: Everyone’s Statistical Software
A new version of Zelig is introduced that has been written using R’s Reference Classes and makes the generalized information matrix test available for all appropriate models, and it integrates with R libraries for multiple imputation, counterfactual analysis, and causal inference.
Multiple imputation for continuous variables using a Bayesian principal component analysis†
ABSTRACT We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation


Multiple Imputation for Multivariate Missing-Data Problems: A Data Analyst's Perspective.
The key ideas of multiple imputation are reviewed, the software programs currently available are discussed, and their use on data from the Adolescent Alcohol Prevention Trial is demonstrated.
Diagnostics for multivariate imputations
This work considers three sorts of diagnostics for random imputations: displays of the completed data, comparisons of the distributions of observed and imputed data values and checks of the fit of observed data to the model that is used to create the imputations.
Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation
This work adapts an algorithm and uses it to implement a general-purpose, multiple imputation model for missing data that is considerably faster and easier to use than the leading method recommended in the statistics literature.
Bootstrap for Imputed Survey Data
It is shown that correct bootstrap estimates can be obtained by imitating the process of imputing the original data set in the bootstrap resampling, and the proposed bootstrap is asymptotically valid irrespective of the sampling design, the imputation method, or the type of statistic used in inference.
Missing Data, Imputation, and the Bootstrap
Three main topics are discussed: bootstrap methods for missing data, these methods' relationship to the theory of multiple imputation, and computationally efficient ways of executing them.
Multiple Imputation for Interval Estimation from Simple Random Samples with Ignorable Nonresponse
Abstract Several multiple imputation techniques are described for simple random samples with ignorable nonresponse on a scalar outcome variable. The methods are compared using both analytic and Monte
Robust Statistical Modeling Using the t Distribution
Abstract The t distribution provides a useful extension of the normal for statistical modeling of data sets involving errors with longer-than-normal tails. An analytical strategy based on maximum
What to Do about Missing Values in Time‐Series Cross‐Section Data
A multiple imputation model is built that allows smooth time trends, shifts across cross‐sectional units, and correlations over time and space, resulting in far more accurate imputations, and enables analysts to incorporate knowledge from area studies experts via priors on individual missing cell values, rather than on difficult‐to‐interpret model parameters.
Robust Estimation of the Mean and Covariance Matrix from Data with Missing Values
SUMMARY Methods of Rubin (1983) for robust estimation of a mean and covariance matrix and associated parameters are extended to analyse data with missing values. The methods are maximum likelihood
Making the Most Of Statistical Analyses: Improving Interpretation and Presentation
This article offers an approach, built on the technique of statistical simulation, to extract the currently overlooked information from any statistical method and to interpret and present it in a reader-friendly manner.