Autoencoder-based Attribute Noise Handling Method for Medical Data

@article{Ranvier2022AutoencoderbasedAN,
  title={Autoencoder-based Attribute Noise Handling Method for Medical Data},
  author={Thomas Ranvier and Haytham Elghazel and Emmanuel Coquery and Khalid Benabdeslem},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.10609}
}
. Medical datasets are particularly subject to attribute noise, that is, missing and erroneous values. Attribute noise is known to be largely detrimental to learning performances. To maximize future learning performances it is primordial to deal with attribute noise before any inference. We propose a simple autoencoder-based preprocessing method that can correct mixed-type tabular data corrupted by attribute noise. No other method currently exists to handle attribute noise in tabular data. We… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 17 REFERENCES

MIDA: Multiple Imputation Using Denoising Autoencoders

TLDR
Evaluation on several real life datasets show the proposed multiple imputation model based on overcomplete deep denoising autoencoders significantly outperforms current state-of-the-art methods under varying conditions while simultaneously improving end of the line analytics.

Missing Data Imputation using Optimal Transport

TLDR
This work uses optimal transport distances to quantify that criterion and turn it into a loss function to impute missing data values, and proposes practical methods to minimize these losses using end-to-end learning.

Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes

TLDR
This study surveys the use of Autoencoder for the imputation of tabular data and considers 26 works published between 2014 and 2020, and shows that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.

Class Noise vs. Attribute Noise: A Quantitative Study

TLDR
A systematic evaluation on the effect of noise in machine learning separates noise into two categories: class noise and attribute noise, and investigates the relationship between attribute noise and classification accuracy, the impact of noise at different attributes, and possible solutions in handling attribute noise.

Dealing with Predictive-but-Unpredictable Attributes in Noisy Data Sources

TLDR
This paper presents a study on identifying, cleansing and measuring noise for predictive-but-unpredictable attributes, and suggests new strategies are more effective and more efficient than previous alternatives.

GAIN: Missing Data Imputation using Generative Adversarial Nets

TLDR
This work proposes a novel method for imputing missing data by adapting the well-known Generative Adversarial Nets (GAN) framework and calls it GAIN, which significantly outperforms state-of-the-art imputation methods.

The pairwise attribute noise detection algorithm

TLDR
This work presents a novel approach for detecting instances with attribute noise and demonstrates its usefulness with case studies using two different real-world software measurement data sets, showing that PANDA provides better noise detection performance than the DM algorithm.

Trajectories, bifurcations, and pseudo-time in large clinical datasets: applications to myocardial infarction and diabetes data

TLDR
A pseudo-time quantification-based approach makes it possible to apply the methods developed for dynamical disease phenotyping and illness trajectory analysis (diachronic data analysis) to synchronic observational data.

Applications of multiple imputation in medical studies: from AIDS to NHANES

TLDR
This paper reviews three applications of Rubin's method about estimating the reporting delay in acquired immune deficiency syndrome (AIDS) surveillance systems for the purpose of estimating survival time after AIDS diagnosis and handling nonresponse in United States National Health and Nutrition Examination Surveys (NHANES).

Polishing Blemishes: Issues in Data Correction

  • C. Teng
  • Computer Science
    IEEE Intell. Syst.
  • 2004
TLDR
Polishing is compared to two alternative approaches to handling data imperfections, focusing on how to evaluate and validate data correction mechanisms.