A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem

  title={A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem},
  author={Khan Md. Hasib and Md. Sadiq Iqbal and Faisal Muhammad Shah and Jubayer Al Mahmud and Mahmudul Hasan Popel and Md. Imran Hossain Showrov and Shakil Ahmed and Obaidur Rahman},
The problem of class imbalance is extensive for focusing on numerous applications in the real world. In such a situation, nearly all of the examples are labeled as one class called majority class, while far fewer examples are labeled as the other class usually, the more important class is called minority. Over the last few years, several types of research have been carried out on the issue of class imbalance, including data sampling, cost-sensitive analysis, Genetic Programming based models… 

Figures and Tables from this paper

A boosting based approach to handle imbalanced data

A novel boosting-based algorithm for learning from imbalanced datasets, based on a combination of the proposed Peak under-sampling algorithm and over-sampled technique (SMOTE) in the boosting procedure is proposed.

Combining SMOTE and OVA with Deep Learning and Ensemble Classifiers for Multiclass Imbalanced

The proposed hybrid method using the stacking algorithm received a higher accuracy rate than other methods in the car, pageblocks, and Ecoli datasets and gained the highest performance of classification at 98.47% in the dermatology dataset where the random forest is used as a classifier.

Tomek Link and SMOTE Approaches for Machine Fault Classification with an Imbalanced Dataset

This research investigates the use of a Naïve Bayes classifier, support vector machine, and k-nearest neighbors together with synthetic minority oversampling technique, Tomek link, and the combination of these two resampling techniques for fault classification with simulation and experimental imbalanced data for condition monitoring on a wound-rotor induction generator.

Active Learning with an Adaptive Classifier for Inaccessible Big Data Analysis

A framework involving a support vector machine (SVM) technique in AL is proposed for mining big data to manage inaccessible data situations and it is found that the proposed method increases the efficiency of the classifiers in AL with fewer training instances.

A Machine Learning and Explainable AI Approach for Predicting Secondary School Student Performance

A predictional model for student's success in secondary education using five classification algorithms: Logistic Regression, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), XGBoost, and Naive Bayes, where the data is gathered from two Portuguese school reports and surveys.

BMNet-5: A Novel Approach of Neural Network to Classify the Genre of Bengali Music Based on Audio Features

A unique technique called BMNet-5 is developed to perform a multiclass classification of Bangla music genres such as “Bangla Adhunik,” ‘Bangla Hip-Hop,’ and ‘Nazrulgeeti’, and the suggested model is based on a neural network designed to predict music genre from audio inputs.

COVID-19 Prediction based on Infected Cases and Deaths of Bangladesh using Deep Transfer Learning

This study aims to forecast impending COVID-19 exposed instances and fatalities using a time series dataset utilizing proposed deep transfer learning model where encoder-decoder CNN-LSTM along with deep CNN pretrained models such as: ResNet-50, DenseNet-201, MobileNet-V2, and Inception-ResNet- V2 performed.

A Novel Deep Learning based Sentiment Analysis of Twitter Data for US Airline Service

A novel deep learning model is proposed that effectively combines different word embedding with deep learning methods to evaluate a dataset made up of tweets for six major US Airlines and multi-class sentiment analysis.

News Classification from Microblogging Dataset using Supervised Learning

This paper proposes a model to identify news from the Twitter dataset and find the best outcome for the microblogging dataset, which began with basic data crawling and after applying four supervised learning algorithms, ended with the selection of the best one.

3D Gesture Recognition and Adaptation for Human–Robot Interaction

A 3D gesture recognition and adaption system based on Kinect for human-robot interaction that recognizes pointing gestures in real-time and can adapt to new and unrecognized gestures applying semi-supervised self-adaptation or user consent-based adaptation.



A Survey on Methods for Solving Data Imbalance Problem for Classification

A survey of various methods introduced by researchers to handle data imbalance problem in order to improve classification performance is presented and the comparison between the methods on the basis of their advantages and disadvantages is done.

A Novel Distribution Analysis for SMOTE Oversampling Method in Handling Class Imbalance

This paper presents a theoretical and an experimental analysis of the Synthetic Minority Oversampling TEchnique (SMOTE) method, and explores the accuracy of how faithful SMOTE method emulates the underlying density.

Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise

This paper compares the oversampling and undersampling approaches of class imbalance learning in noisy environment and tries to find out which is the better approach in such case.

An Ensemble Learning Imbalanced Data Classification Method Based on Sample Combination Optimization

  • Yuxing Wang
  • Computer Science
    Journal of Physics: Conference Series
  • 2019
It is proved that GABagging can compensate for the shortcomings of related Bagging-based methods such as easy loss, increasing samples and not guaranteeing the validity and existence of classification boundaries after sampling.

Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data

A novel similarity measurement technique ranked order similarity-ROS is used to evaluate the variance ranking attribute selection compared to the Pearson correlations and information gain technique, and shows better results than the benchmarks.

A survey on addressing high-class imbalance in big data

This paper provides a large survey of published studies within the last 8 years, focusing on high-class imbalance in big data in order to assess the state-of-the-art in addressing adverse effects due to class imbalance.

RUSBoost: A Hybrid Approach to Alleviating Class Imbalance

This paper presents a new hybrid sampling/boosting algorithm, called RUSBoost, for learning from skewed training data, which provides a simpler and faster alternative to SMOTEBoost, which is another algorithm that combines boosting and data sampling.

Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data

An adaptive multiple classifier system named AMCS to cope with multi-class imbalanced learning, which makes a distinction among different kinds of imbalanced data is proposed and applied in oil-bearing reservoir recognition.