IFGAN: Missing Value Imputation using Feature-specific Generative Adversarial Networks

@article{Qiu2020IFGANMV,
  title={IFGAN: Missing Value Imputation using Feature-specific Generative Adversarial Networks},
  author={Wei Qiu and Yangsibo Huang and Quanzheng Li},
  journal={2020 IEEE International Conference on Big Data (Big Data)},
  year={2020},
  pages={4715-4723}
}
Missing value imputation is a challenging and well- researched topic in data mining. In this paper, we propose IFGAN, a missing value imputation algorithm based on Feature- specific Generative Adversarial Networks (GAN). Our idea is intuitive yet effective: a feature-specific generator is trained to impute missing values, while a discriminator is expected to distinguish the imputed values from observed ones. The proposed architecture is capable of handling different data types, data… 
2 Citations

Figures and Tables from this paper

FragmGAN: Generative Adversarial Nets for Fragmentary Data Imputation and Prediction
TLDR
The proposed FragmGAN has theoretical guarantees for imputation with data Missing At Random (MAR) while no hint mechanism is needed and shows significant advantages for predictive performances in extensive experiments.

References

SHOWING 1-10 OF 29 REFERENCES
GAIN: Missing Data Imputation using Generative Adversarial Nets
TLDR
This work proposes a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework and calls it GAIN, which significantly outperforms state-of-the-art imputation methods.
MIDA: Multiple Imputation Using Denoising Autoencoders
TLDR
Evaluation on several real life datasets show the proposed multiple imputation model based on overcomplete deep denoising autoencoders significantly outperforms current state-of-the-art methods under varying conditions while simultaneously improving end of the line analytics.
Generative Adversarial Nets
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a
Context Encoders: Feature Learning by Inpainting
TLDR
It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.
XGBoost: A Scalable Tree Boosting System
TLDR
This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
Image Inpainting for Irregular Holes Using Partial Convolutions
TLDR
This work proposes the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels, and outperforms other methods for irregular masks.
Strategies for Handling Missing Data in Electronic Health Record Derived Data
TLDR
This paper focuses on the analytical approaches for handling missing data, primarily multiple imputation, in EHR data, and the broad range of variables available in typical EHR systems provide a wealth of information for mitigating potential biases caused by missing data.
Spectral Regularization Algorithms for Learning Large Incomplete Matrices
TLDR
Using the nuclear norm as a regularizer, the algorithm Soft-Impute iteratively replaces the missing elements with those obtained from a soft-thresholded SVD in a sequence of regularized low-rank solutions for large-scale matrix completion problems.
Multiple imputation by chained equations: what is it and how does it work?
TLDR
This paper provides an introduction to the MICE method with a focus on practical aspects and challenges in using this method.
Imputation and quality control steps for combining multiple genome-wide datasets
TLDR
The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR, and the relationship between allelic R2 and minor allele frequency.
...
...