Developing New Fitness Functions in Genetic Programming for Classification With Unbalanced Data

  title={Developing New Fitness Functions in Genetic Programming for Classification With Unbalanced Data},
  author={Urvesh Bhowan and Mark Johnston and Mengjie Zhang},
  journal={IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)},
Machine learning algorithms such as genetic programming (GP) can evolve biased classifiers when data sets are unbalanced. Data sets are unbalanced when at least one class is represented by only a small number of training examples (called the minority class) while other classes make up the majority. In this scenario, classifiers can have good accuracy on the majority class but very poor accuracy on the minority class(es) due to the influence that the larger majority class has on traditional… 

Figures and Tables from this paper

New Fitness Functions in Genetic Programming for Classification with High-dimensional Unbalanced Data

New fitness functions in GP are proposed to address the class imbalance issue in classification with high-dimensional unbalanced data and one fitness function is proposed to approximate Area Under Curve (AUC) with the goal to save the training time.

High-Dimensional Unbalanced Binary Classification by Genetic Programming with Multi-Criterion Fitness Evaluation and Selection

A new two-criterion fitness function is developed, which considers two criteria, that is, the approximation of area under the curve (AUC) and the classification clarity (i.e., how well a program can separate two classes).

Improving Fitness Functions in Genetic Programming for Classification on Unbalanced Credit Card Datasets

Two fitness functions from previous studies are examined and two new fitness functions are developed to evolve GP classifiers with superior accuracy on the minority class and overall, encouraging fitter solutions on both the minority and the majority classes.

Reuse of program trees in genetic programming with a new fitness function in high-dimensional unbalanced classification

A new fitness function is developed to address the class unbalanced issue, and a strategy is proposed to reuse previous good GP trees when using multiple GP processes to build a multi-classifier system.

Genetic programming for high-dimensional imbalanced classification with a new fitness function and program reuse mechanism

Experimental results show that the proposed GP approach to high-dimensional imbalanced classification generally outperforms other GP methods and traditional classification algorithms using sampling methods to solve the problem of class imbalance.

A Cost-sensitive Genetic Programming Approach for High-dimensional Unbalanced Classification

This paper investigates how cost-sensitive learning can be effectively used by GP to address the problem of class imbalance in high-dimensional unbalanced classification.

Reusing Genetic Programming for Ensemble Selection in Classification of Unbalanced Data

This paper develops a two-step approach to evolving ensembles using genetic programming (GP) for unbalanced data and proposes a novel ensemble selection approach using GP to automatically find/choose the best individuals for the ensemble.

A Threshold-free Classification Mechanism in Genetic Programming for High-dimensional Unbalanced Classification

Experimental results indicate that the proposed GP method often performs better than other GP methods that use a fitness function to solve the issue of class imbalance, in terms of classification performance and training time.

A Multiobjective Genetic Programming-Based Ensemble for Simultaneous Feature Selection and Classification

We present an integrated algorithm for simultaneous feature selection (FS) and designing of diverse classifiers using a steady state multiobjective genetic programming (GP), which minimizes three



Differentiating between individual class performance in Genetic Programming fitness for classification with unbalanced data

Improvements to the fitness function in Genetic Programming are investigated to better solve binary classification problems with unbalanced data and develops four new fitness functions which consider the accuracy of majority and minority class separately.

Fitness Functions in Genetic Programming for Classification with Unbalanced Data

This paper describes a genetic programming (GP) approach to binary classification with class imbalance problems and shows that when using the overall classification accuracy as the fitness function, the GP system is strongly biased toward the majority class.

A Comparison of Classification Strategies in Genetic Programming with Unbalanced Data

There is no overall difference between the two GP classification strategies, and that both strategies can evolve good solutions in binary classification when used in combination with an effective fitness function.

GP Classification under Imbalanced Data sets: Active Sub-sampling and AUC Approximation

A 'Simple Active Learning Heuristic' (SALH) in which a subset of exemplars is sampled with uniform probability under a class balance enforcing rule for fitness evaluation is proposed and demonstrated to be both efficient and effective at providing solutions maximizing performance assessed in terms of AUC.

Evolving ensembles in multi-objective genetic programming for classification with unbalanced data

A Multi-objective Genetic Programming approach using negative correlation learning to evolve accurate and diverse ensembles of non-dominated solutions where members vote on class membership, which achieves high accuracy on both classes using six unbalanced binary data sets.

Class imbalance problem in UCS classifier system: fitness adaptation

This work studies UCS, a rule-based classifier system which learns under a supervised learning scheme, and finds UCS fairly sensitive to high levels of class imbalance, to the degree that UCS tends to evolve a simple model of the feature space classified according to the majority class.

Representing classification problems in genetic programming

The results show that the dynamic range selection method is well-suited to the task of multi-class classification and is capable of producing classifiers that are more accurate than the other methods tried when comparable training times are allowed.

Ensemble Approach for the Classification of Imbalanced Data

This work proposes to consider a large number of relatively small and balanced subsets where representatives from the larger pattern are to be selected randomly and produces a matrix of linear regression coefficients whose rows represent random subsets and columns represent features.

Advanced Genetic Programming Based Machine Learning

A Genetic Programming based approach for solving classification problems is presented in this paper that interprets classification problems as optimization problems and a solution is found by a heuristic optimization algorithm.

Using enhanced genetic programming techniques for evolving classifiers in the context of medical diagnosis

The enhanced genetic programming approach presented here is able to produce comparable or even better results than linear modeling methods, artificial neural networks, kNN classification, support vector machines and also various genetic programming approaches.