• Corpus ID: 220514317

Predicting feature imputability in the absence of ground truth

  title={Predicting feature imputability in the absence of ground truth},
  author={Niamh Mccombe and Xuemei Ding and Girijesh Prasad and David P. Finn and Stephen Todd and Paula L. McClean and KongFatt Wong-Lin},
Data imputation is the most popular method of dealing with missing values, but in most real life applications, large missing data can occur and it is difficult or impossible to evaluate whether data has been imputed accurately (lack of ground truth). This paper addresses these issues by proposing an effective and simple principal component based method for determining whether individual data features can be accurately imputed - feature imputability. In particular, we establish a strong linear… 

Figures and Tables from this paper

Practical Strategies for Extreme Missing Data Imputation in Dementia Diagnosis

This work identified and replicated the extreme missingness structure of data from a real-world memory clinic on a larger open dataset, and found that iterative imputation on the training dataset combined with a reduced-feature classification model provides the best approach, in terms of speed and accuracy.

Shaping a data-driven era in dementia care pathway through computational neurology approaches

The data-driven era for dementia research has arrived with the potential to transform the healthcare system, creating a more efficient, transparent and personalised service for dementia.



MissForest - non-parametric missing value imputation for mixed-type data

In this comparative study, missForest outperforms other methods of imputation especially in data settings where complex interactions and non-linear relations are suspected and the out-of-bag imputation error estimates of missForest prove to be adequate in all settings.

Missing data imputation: focusing on single imputation.

  • Wentao Bao
  • Engineering
    Annals of translational medicine
  • 2016
This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations.

Comparison of imputation methods for missing laboratory data in medicine

MissForest is a highly accurate method of imputations for missing laboratory data and outperforms other common imputation techniques in terms of imputation error and maintenance of predictive ability with imputed values in two clinical predicative models.

Missing Features Reconstruction and Its Impact on Classification Accuracy

This paper focuses on the scenario where entire features are missing which can be understood as a specific case of transfer learning and shows that MICE and linear regression are generally good imputers regardless of the conditions.

Does the Missing Data Imputation Method Affect the Composition and Performance of Prognostic Models?

The MICE model showed the best performance followed by E-M model, while the final models were not the same, in terms of composition and performance.

05-07 5 K-Nearest Neighbor in Missing Data Imputation

A comparative study on single imputation techniques such as Mean, Median, and Standard Deviation combined with k-NN algorithm is proposed, showing better result than Mean Substitution.

MICE: Multivariate Imputation by Chained Equations in R

Mice adds new functionality for imputing multilevel data, automatic predictor selection, data handling, post-processing imputed values, specialized pooling routines, model selection tools, and diagnostic graphs.

Using mutual information for selecting features in supervised neural net learning

  • R. Battiti
  • Computer Science
    IEEE Trans. Neural Networks
  • 1994
This paper investigates the application of the mutual information criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network

Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach

  • H. Wold
  • Sociology
    Journal of Applied Probability
  • 1975
The NIPALS approach is applied to the ‘soft’ type of model that has come to the fore in sociology and other social sciences in the last five or ten years, namely path models that involve latent