Corpus ID: 15934251

A Review of Missing Data Treatment Methods

@inproceedings{Peng2005ARO,
  title={A Review of Missing Data Treatment Methods},
  author={Liu Zonghai Peng and Lei Lei},
  year={2005}
}
Missing data is a common problem for data quality. Most real datasets have missing data. This paper analyzes the missing data mechanisms and treatment rules. Popular and conventional missing data treatment methods are introduced and compared. Suitable environments for method are analyzed in experiments. Methods are classified into certain categories according to different characters. 
Comparison Method for Handling Missing Data in Clinical Studies
TLDR
KNN imputation method provides better accuracy than other methods using datasets in clinical studies, chronic kidney disease, Indian Pima diabetes, thyroid, and hepatitis. Expand
MISSING DATA IMPUTATION IN CARDIAC DATA SET (SURVIVAL PROGNOSIS)
TLDR
The performance of different data imputation methods in a task where the aim is to predict the probability of survival of cardiac patient is analyzed, finding k-NN methods may be useful to provide relatively accurate estimations with lower error variability. Expand
Missing Value Imputation in Multi Attribute Data Set
Data mining has made a great progress in recent year but the problem of missing data or value has remained great challenge for data mining. Missing data or value in a datasets can affect theExpand
Reasoning with Missing Values in Multi Attribute Datasets
The presence of missing data in a datasets can affect the performance of classifier which leads to difficulty of extracting useful information from datasets .Dataset taken for this study is studentExpand
An Ensemble approach on Missing Value Handling in Hepatitis Disease Dataset
TLDR
This paper investigates the exploit of a machine learning technique as a missing value imputation process for incomplete Hepatitis data and reveals that classifier performance is enhanced when the Bagging based imputation algorithm is used to foresee missing attribute values. Expand
Estimation of Missing Values Using Decision Tree Approach
Data mining has made a great progress in recent year but the problem of missing data or value has remained great challenge for data mining. Missing data or value in a datasets can affect theExpand
Handling Missing Values in Chronic Kidney Disease Datasets Using KNN, K-Means and K-Medoids Algorithms
TLDR
This paper presents a framework that assists in imouting missing values in a large Chronic Kidney Disease (CKD) datasets and uses three machine learning algorithms i.e., K-Nearest Neighbors, K-Means and K-Medoids Clustering to impute the missing values. Expand
Comparison of K-Means clustering and statistical outliers in reducing medical datasets
  • T. Santhanam, M. Padmavathi
  • Computer Science
  • 2014 International Conference on Science Engineering and Management Research (ICSEMR)
  • 2014
TLDR
This research work compares the data reduction percentage performed by K-Means and Statistical Outliers for all the three methods of imputation and proves that, the reduction rate of outliers is less than K- Means clustering. Expand
Treatment of Missing Values in Data Mining
TLDR
Certain techniques and algorithms are reviewed to deal with the puzzle of missing values whereby achieving pure data set (i.e., data set without missing value) which in-turn will lead to path of correct and accurate decision making. Expand
Normalization and Outlier Removal in Class Center-Based Firefly Algorithm for Missing Value Imputation
TLDR
This study aims to proposed combination of normalization and outlier removal’s before imputing missing values using several methods, mean, random value, regression, multiple imputation, KNN, and C3-FA, and shows that the proposed method is able to reproduce the real values of the data or the prediction accuracy and maintain the distribution accuracy. Expand
...
1
2
3
4
...

References

SHOWING 1-10 OF 10 REFERENCES
An analysis of four missing data treatment methods for supervised learning
TLDR
This analysis indicates that missing data imputation based on the k-nearest neighbor algorithm can outperform the internal methods used by C4.5 and CN2 to treat missing data, and can also outperforms the mean or mode imputation method, which is a method broadly used to treatMissing values. Expand
A Comparison Study of Missing Value Processing Methods
TLDR
Five models are built to improve the efficiency of the prediction and shows that use naive Bayesian classifier to predict missing values iteratively in degressive order of information gain is effective. Expand
The Treatment of Missing Values and its Effect on Classifier Accuracy
TLDR
This paper carries out experiments with twelve datasets to evaluate the effect on the misclassification error rate of four methods for dealing with missing values: the case deletion method, mean imputation, median imputations, and the KNN imputation procedure. Expand
A Comparison of Several Approaches to Missing Attribute Values in Data Mining
TLDR
Using the Wilcoxon matched-pairs signed rank test, it is concluded that the C4.5 approach and the method of ignoring examples with missing attribute values are the best methods among all nine approaches. Expand
Techniques for Dealing with Missing Data in Knowledge Discovery Tasks
TLDR
This report reviews the main missing data techniques (MDTs), trying to highlight their advantages and disadvantages, and presents a taxonomy of MDTs. Expand
Trends in Data Mining and Knowledge Discovery
TLDR
This chapter describes a six-stepDMKD process model and its component technologies, which help to design flexible, semiautomated, and easy-to-use DMKD models to enable building knowledge repositories and allowing for communication between several data mining tools, databases, and knowledge repositories. Expand
Efficient Algorithms for Dealing with Missing values in Knowledge Discovery, Master Degree Thesis
  • Efficient Algorithms for Dealing with Missing values in Knowledge Discovery, Master Degree Thesis
  • 2001
Efficient Algorithms for Dealing with Missing values in Knowledge Discovery, Master Degree Thesis, Japan
  • Advanced Institute of Science and Technology,
  • 2001
DataMining Concepts and Techniques
  • DataMining Concepts and Techniques
  • 2000