• Corpus ID: 52906258

Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance

  title={Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance},
  author={Loai Abdallah and Mahmoud Kaiyal},
  journal={International Journal of Medical and Health Sciences},
Missing values in real-world datasets are a common problem. Many algorithms were developed to deal with this problem, most of them replace the missing values with a fixed value that was computed based on the observed values. In our work, we used a distance function based on Bhattacharyya distance to measure the distance between objects with missing values. Bhattacharyya distance, which measures the similarity of two probability distributions. The proposed distance distinguishes between known… 

Figures and Tables from this paper



A Distance Function for Data with Missing Values and Its Application

This paper defines a distance function for unlabeled datasets with missing values using the Bhattacharyya distance, which measures the similarity of two probability distributions, and opts for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity.

An analysis of four missing data treatment methods for supervised learning

This analysis indicates that missing data imputation based on the k-nearest neighbor algorithm can outperform the internal methods used by C4.5 and CN2 to treat missing data, and can also outperforms the mean or mode imputation method, which is a method broadly used to treatMissing values.

A Comparison of Several Approaches to Missing Attribute Values in Data Mining

Using the Wilcoxon matched-pairs signed rank test, it is concluded that the C4.5 approach and the method of ignoring examples with missing attribute values are the best methods among all nine approaches.

"Missing is useful": missing values in cost-sensitive decision trees

This paper discusses and compares several strategies that utilize only known values and that "missing is useful" for cost reduction in cost-sensitive decision tree learning and considers both test costs and misclassification costs.

Techniques for Dealing with Missing Data in Knowledge Discovery Tasks

This report reviews the main missing data techniques (MDTs), trying to highlight their advantages and disadvantages, and presents a taxonomy of MDTs.

Shell-neighbor method and its application in missing data imputation

This paper introduces a new imputation approach called SN (Shell Neighbors) imputation, or simply SNI, and demonstrates that the generalized SNI method outperforms the kNN imputation method at imputation accuracy and classification accuracy.

Missing-Data Methods for Generalized Linear Models

This work examines data that are missing at random and nonignorable missing, and compares four common approaches for inference in generalized linear models with missing covariate data: maximum likelihood (ML), multiple imputation (MI), fully Bayesian (FB), and weighted estimating equations (WEEs).

Trends in Data Mining and Knowledge Discovery

This chapter describes a six-stepDMKD process model and its component technologies, which help to design flexible, semiautomated, and easy-to-use DMKD models to enable building knowledge repositories and allowing for communication between several data mining tools, databases, and knowledge repositories.

Statistical Analysis With Missing Data

  • N. Lazar
  • Computer Science
  • 2003
Generalized Estimating Equations is a good introductory book for analyzing continuous and discrete correlated data using GEE methods and provides good guidance for analyzing correlated data in biomedical studies and survey studies.