A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets

  title={A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets},
  author={Alberto Fern{\'a}ndez and Crist{\'o}bal Jos{\'e} Carmona and Mar{\'i}a Jos{\'e} del Jes{\'u}s and Francisco Herrera},
  journal={International journal of neural systems},
  volume={27 6},
Imbalanced classification is related to those problems that have an uneven distribution among classes. [] Key Method Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples.

Tables from this paper

Multi-criteria analysis involving Pareto-optimal misclassification tradeoffs on imbalanced datasets
This work takes into account the existing conflict among the learning losses of the classes, and uses a deterministic multi-objective optimization method, called MONISE, to create a set of solutions with diverse misclassification tradeoffs among the classes.
Imbalanced Classification with Multiple Classes
Dealing with multi-class problems is a hard issue, which becomes more severe in the presence of imbalance, and most of the techniques proposed for binary imbalanced classification are not directly applicable for multiple classes.
A memetic approach for training set selection in imbalanced data sets
The best training samples are selected from data samples with the goal of improving the performance of classifier when dealing with imbalanced data and some heuristic methods are presented which use local information to give a proper view about whether removing or retaining each sample of training set.
Revisiting data complexity metrics based on morphology for overlap and imbalance: snapshot, new overlap number of balls metrics and singular problems prospect
This research work focuses on revisiting complexity metrics based on data morphology based on ball coverage by classes, which provide both good estimates for class overlap, and great correlations with the classification performance.
Joint feature and instance selection using manifold data criteria: application to image classification
  • F. Dornaika
  • Computer Science
    Artificial Intelligence Review
  • 2020
This paper targets the joint feature and instance selection by adopting feature subset relevance and sparse modeling representative selection, and evaluates the performance of the proposed schemes using image classification where classifiers are the nearest neighbor classifier and support vector machine classifier.
Ensemble Learning via Multimodal Multiobjective Differential Evolution and Feature Selection
A novel ensemble method, which utilizes a multimodal multiobjective differential evolution (MMODE) algorithm to select feature subsets and optimize base classifiers parameters, is proposed and experimental results on several benchmark classification databases evidence that the proposed algorithm is valid.


A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches
A taxonomy for ensemble-based methods to address the class imbalance where each proposal can be categorized depending on the inner ensemble methodology in which it is based is proposed and a thorough empirical comparison is developed by the consideration of the most significant published approaches to show whether any of them makes a difference.
On the k-NN performance in a challenging scenario of imbalance and overlapping
This local model is compared to other machine learning algorithms, attending to how their behaviour depends on a number of data complexity features (global imbalance, size of overlap region, and its local imbalance) and several conclusions useful for classifier design are inferred.
Evolutionary rule-based systems for imbalanced data sets
This paper adapts and analyzes LCSs for challenging imbalanced data sets and establishes the bases for further studying the combination of re-sampling technique and learner best suited to a specific kind of problem.