A principal component method to impute missing values for mixed data

@article{Audigier2016APC,
  title={A principal component method to impute missing values for mixed data},
  author={Vincent Audigier and François Husson and Julie Josse},
  journal={Advances in Data Analysis and Classification},
  year={2016},
  volume={10},
  pages={5-26}
}
We propose a new method to impute missing values in mixed data sets. It is based on a principal component method, the factorial analysis for mixed data, which balances the influence of all the variables that are continuous and categorical in the construction of the principal components. Because the imputation uses the principal axes and components, the prediction of the missing values is based on the similarity between individuals and on the relationships between variables. The properties of… 

missMDA: A Package for Handling Missing Values in Multivariate Data Analysis

We present the R package missMDA which performs principal component methods on incomplete data sets, aiming to obtain scores, loadings and graphical representations despite missing values. Package

Multiple imputation for continuous variables using a Bayesian principal component analysis†

ABSTRACT We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation

Missing Value Imputation for Mixed Data Through Gaussian Copula

TLDR
A new semiparametric algorithm to impute missing values, with no tuning parameters, is proposed, which models mixed data as a Gaussian copula and reveals the statistical associations among variables.

Missing Value Imputation for Mixed Data via Gaussian Copula

TLDR
A new semiparametric algorithm to impute missing values, with no tuning parameters, is proposed, which models mixed data as a Gaussian copula and reveals the statistical associations among variables.

MIMCA: multiple imputation for categorical variables with multiple correspondence analysis

TLDR
The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimates of the variability of the estimators.

Contribution to missing values & principal component methods

This manuscript was written for the Habilitation a Diriger des Recherches and it describes my research activities. The first part of this manuscript is named "A missing values tour with principal

Missing Data Imputation and Its Effect on the Accuracy of Classification

TLDR
This analysis finds that missing data imputation using hot deck, iterative robust model-based imputation, factorial analysis for mixed data and Random Forest Imputation perform in a similar manner regardless of the amount of missing data and have the highest mean percentage of observations correctly classified.

Principle Components Analysis based frameworks for efficient missing data imputation algorithms

TLDR
Principal Component Analysis Imputation (PCAI) is proposed, a simple but versatile framework based on Principal Component Analysis (PCA) to speed up the imputation process and alleviate memory issues of many available imputation techniques, without sacrificing the imputations quality in term of MSE.

How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data

TLDR
Six conceptually different multiple imputation methods are described and compared, alongside the commonly used complete case analysis, to explore whether the choice of methodology for handling missing data might impact clinical conclusions drawn from a regression model when data are categorical.
...

References

SHOWING 1-10 OF 51 REFERENCES

missMDA: A Package for Handling Missing Values in Multivariate Data Analysis

We present the R package missMDA which performs principal component methods on incomplete data sets, aiming to obtain scores, loadings and graphical representations despite missing values. Package

MissForest - non-parametric missing value imputation for mixed-type data

TLDR
In this comparative study, missForest outperforms other methods of imputation especially in data settings where complex interactions and non-linear relations are suspected and the out-of-bag imputation error estimates of missForest prove to be adequate in all settings.

Missing value estimation methods for DNA microarrays

TLDR
It is shown that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVD Impute and KNN Impute surpass the commonly used row average method (as well as filling missing values with zeros).

Handling missing values in exploratory multivariate data analysis methods

TLDR
A regularized iterative PCA algorithm to provide point estimates of the principal axes and components and to overcome the major issue of overfitting is described and implemented in the R package missMDA.

Multiple imputation of missing blood pressure covariates in survival analysis.

TLDR
A non-response problem in survival analysis where the occurrence of missing data in the risk factor is related to mortality is studied, and multiple imputation is used to impute missing blood pressure and then analyse the data under a variety of non- response models.

Multiple imputation of discrete and continuous data by fully conditional specification

  • S. van Buuren
  • Computer Science
    Statistical methods in medical research
  • 2007
TLDR
FCS is a useful and easily applied flexible alternative to JM when no convenient and realistic joint distribution can be specified, and shows that FCS behaves very well in the cases studied.

Practical Approaches to Principal Component Analysis in the Presence of Missing Values

TLDR
A probabilistic formulation of PCA provides a good foundation for handling missing values, and formulas for doing that are provided, and a novel fast algorithm is introduced and extended to variational Bayesian learning.

MULTIPLE IMPUTATION OF INCOMPLETE CATEGORICAL DATA USING LATENT CLASS ANALYSIS

TLDR
The proposed multiple imputation method, which is implemented in Latent GOLD software for latent class analysis, is illustrated with two examples and a comparison to well-established methods such as maximum likelihood is compared.

Handling Missing Values with Regularized Iterative Multiple Correspondence Analysis

TLDR
The overfitting problem is pointed out and a regularized version of the algorithm is proposed to overcome this major issue and results are promising with respect to other methods such as the missing-data passive modified margin method.

Selecting the number of components in principal component analysis using cross-validation approximations

...