Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance
@article{Abdallah2018DistancesOI, title={Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance}, author={Loai Abdallah and Mahmoud Kaiyal}, journal={International Journal of Medical and Health Sciences}, year={2018}, volume={12}, pages={314-319} }
Missing values in real-world datasets are a common problem. Many algorithms were developed to deal with this problem, most of them replace the missing values with a fixed value that was computed based on the observed values. In our work, we used a distance function based on Bhattacharyya distance to measure the distance between objects with missing values. Bhattacharyya distance, which measures the similarity of two probability distributions. The proposed distance distinguishes between known…
References
SHOWING 1-10 OF 12 REFERENCES
A Distance Function for Data with Missing Values and Its Application
- Computer Science
- 2013
This paper defines a distance function for unlabeled datasets with missing values using the Bhattacharyya distance, which measures the similarity of two probability distributions, and opts for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity.
An analysis of four missing data treatment methods for supervised learning
- Computer ScienceAppl. Artif. Intell.
- 2003
This analysis indicates that missing data imputation based on the k-nearest neighbor algorithm can outperform the internal methods used by C4.5 and CN2 to treat missing data, and can also outperforms the mean or mode imputation method, which is a method broadly used to treatMissing values.
A Comparison of Several Approaches to Missing Attribute Values in Data Mining
- Computer ScienceRough Sets and Current Trends in Computing
- 2000
Using the Wilcoxon matched-pairs signed rank test, it is concluded that the C4.5 approach and the method of ignoring examples with missing attribute values are the best methods among all nine approaches.
"Missing is useful": missing values in cost-sensitive decision trees
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2005
This paper discusses and compares several strategies that utilize only known values and that "missing is useful" for cost reduction in cost-sensitive decision tree learning and considers both test costs and misclassification costs.
Techniques for Dealing with Missing Data in Knowledge Discovery Tasks
- Computer Science
- 2004
This report reviews the main missing data techniques (MDTs), trying to highlight their advantages and disadvantages, and presents a taxonomy of MDTs.
Shell-neighbor method and its application in missing data imputation
- Computer ScienceApplied Intelligence
- 2009
This paper introduces a new imputation approach called SN (Shell Neighbors) imputation, or simply SNI, and demonstrates that the generalized SNI method outperforms the kNN imputation method at imputation accuracy and classification accuracy.
Missing-Data Methods for Generalized Linear Models
- Computer Science
- 2005
This work examines data that are missing at random and nonignorable missing, and compares four common approaches for inference in generalized linear models with missing covariate data: maximum likelihood (ML), multiple imputation (MI), fully Bayesian (FB), and weighted estimating equations (WEEs).
Trends in Data Mining and Knowledge Discovery
- Computer Science
- 2005
This chapter describes a six-stepDMKD process model and its component technologies, which help to design flexible, semiautomated, and easy-to-use DMKD models to enable building knowledge repositories and allowing for communication between several data mining tools, databases, and knowledge repositories.
Review: a gentle introduction to imputation of missing values.
- MathematicsJournal of clinical epidemiology
- 2006
Statistical Analysis With Missing Data
- Computer ScienceTechnometrics
- 2003
Generalized Estimating Equations is a good introductory book for analyzing continuous and discrete correlated data using GEE methods and provides good guidance for analyzing correlated data in biomedical studies and survey studies.