Feature Selection and Ensemble Hierarchical Cluster-Based Under-Sampling Approach for Extremely Imbalanced Datasets ( Application to Gene Classification )

@inproceedings{Soltani2011FeatureSA,
  title={Feature Selection and Ensemble Hierarchical Cluster-Based Under-Sampling Approach for Extremely Imbalanced Datasets ( Application to Gene Classification )},
  author={Sima Soltani and Javad Sadri},
  year={2011}
}
Class distribution in many informative datasets is highly imbalance. In high imbalance dataset there are large amount of negative samples and a small part of positives. It is difficult to classify imbalanced datasets. In this paper we propose an Ensemble Hierarchical Cluster-based Undersampling approach for classification of huge and extremely imbalance datasets. Hierarchical Clustering is used to remove negative samples which are dissimilar to positive samples. Ensemble technique collects… CONTINUE READING