Optimal Feature Selection for Cluster Based Ensemble Classifier Using Meta-Heuristic Function for Medical Disease Data Classification for Symptom Prediction

Abstract

The diversity and applicability of data mining are increase day to day in the field of medical science for the predication of symptom of disease. The data mining provide lots of technique for mine data in several field, the technique of mining as association rule mining, clustering technique, classification technique and emerging technique such as called ensemble classification technique. The process of ensemble classifier increases the classification rate and improved the majority voting of classification technique for individual classification algorithm such as KNN, Decision tree and support vector machine. The new paradigms of ensemble classifier are cluster oriented ensemble technique for classification of data. This research paper apply classification proceed based on classifier selection to medical disease data and propose a clustering-based classifier selection method. In the method, many clusters are selected for a ensemble process. Then, the standard presentation of each classifier on selected clusters is calculated and the classifier with the best average performance is chosen to classify the given data. In the computation of normal act, weighted average is technique is used. Weight values are calculated according to the distances between the given data and each selected cluster. There are generally two types of multiple classifiers combination: multiple classifiers selection and multiple classifiers fusion. Multiple classifiers selection assumes that each classifier has expertise in some local regions of the feature space and attempts to find which classifier has the highest local accuracy in the vicinity of an unknown test sample. Then, this classifier is nominated to make the final decision of the system. Performance of a classifier is frequently the most important aspect of its value and is measured using a variety of well known method and matrix is used. On the other hand knowledge of a classifier is often treated as less important or even neglected. However it is vital for the users of the classifier as they belief it more if they can realize how the classifier works and because additional knowledge about the relations in observed data can be extracted by involved classifier. Consequently some of the old methods focus on knowledge of learned classifiers or transforming nonknowledge classifiers into humanknowledge structure. There is lack of algorithms that treat accuracy and knowledge of classifiers as uniformly significant, converting the domain of constructing a classifier into heuristic optimization crisis. Such algorithms are especially important in domains where there are parts of attribute space that can be classified with high accuracy using knowledgeable classifier and parts that require nonknowledge classifiers to achieve required classification accuracy. The process of combining different clustering output (cluster ensemble or clustering Aggregation) emerged as an alternative approach for improving the quality of the Results of clustering algorithms. It is based on the success of the combination of supervised classifiers. Given a set of objects, a cluster ensemble method consists of two principal steps: Generation, which is about the creation of a set of partitions a of these objects, and Consensus Function, where a new partition, which is the integration of all partitions obtained in the generation step, is computed. Over the past years, many clustering ensemble techniques have been proposed, resulting in new ways to face the problem together with new fields of application for these techniques. Besides the presentation of the main methods, the introduction of taxonomy of the different tendencies and critical comparisons among the methods is International Journal of Recent Trends in Engineering & Research (IJRTER) Volume 02, Issue 05; May 2016 [ISSN: 2455-1457] @IJRTER-2016, All Rights Reserved 44 really important in order to give a practical application to a survey. Thus, due to the importance that clustering ensembles have gained facing cluster analysis, we have made a critical study of the different approaches and the existing methods. Feature selection technique is used for selecting subset of relevant features from the data set to build robust classification models. Classification accuracy is improved by removing most irrelevant and redundant features from the dataset. Ensemble model is proposed for improving classification accuracy by combining the prediction of multiple classifiers. In this research paper we used cluster based ensemble classifier. The performance of each classifier and ensemble model is evaluated by using statistical measures like accuracy, specificity and sensitivity. Classification of medical data is an important task in the prediction of any disease. It even helps doctors in their diagnosis decisions. Cluster oriented Ensemble classifier is to generate a set of classifiers instead of one classifier for the classification of a new object, hoping that the combination of answers of multiple classification results in better performance. We demonstrate the algorithmic use of the classification technique by extending SVM the most popular binary classification algorithms. From the studies above, the key to improve cluster oriented classifier is to improve binary classification. In the final part of the thesis, we include empirical evaluation that aim at understanding binary classification better in the context of ensemble learning.

15 Figures and Tables

Cite this paper

@inproceedings{Yadav2016OptimalFS, title={Optimal Feature Selection for Cluster Based Ensemble Classifier Using Meta-Heuristic Function for Medical Disease Data Classification for Symptom Prediction}, author={D H Sharath Yadav}, year={2016} }