A robust correlation analysis framework for imbalanced and dichotomous data with uncertainty

  title={A robust correlation analysis framework for imbalanced and dichotomous data with uncertainty},
  author={Chun Sing Lai and Yingshan Tao and Fangyuan Xu and Wing W. Y. Ng and Youwei Jia and Haoliang Yuan and Chao Huang and Loi Lei Lai and Zhao Xu and Giorgio Locatelli},
  journal={Inf. Sci.},

Imbalanced Classification Based on Minority Clustering Synthetic Minority Oversampling Technique With Wind Turbine Fault Detection Application

A minority clustering SMOTE method that involves the clustering of minority class samples to improve the imbalance classification performance and indicates that the MC-SMOTE exhibits a better performance than that of the classical SMOTE.

Minority Oversampling Using Sensitivity

This paper handles the imbalance classification using Bayes’ decision rule and proposes a novel oversampling method, the Minority Oversampling using Sensitivity (MOSS), where candidates for new example generations are selected considering their sensitivity with respect to class imbalance.

Undersampling Near Decision Boundary for Imbalance Problems

A novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example, which confirms the superiority of the USS against one baseline method and five resampling methods.

Load forecasting based on deep neural network and historical data augmentation

This study presents a novel load forecasting method known as deep neural network and historical data augmentation (DNN–HDA), which utilises HDA to enhance regression by DNN for monthly load forecasting, considering that the historical data to have a high correlation with the corresponding predicted data.

Stochastic Sensitivity Tree Boosting for Imbalanced Prediction Problems of Protein-Ligand Interaction Sites

A new boosting algorithm (SSTBoost) that consists of stochastic sensitivity measure-based undersampling method and AdaBoost algorithm, which aims to improve the performance of base hypotheses by making them to be complementary and be conjunction with each other.

Informativity assessment and attributes selection in a computer system state identification

The scientific novelty of the results is in the analysis of the Windows operating system events, assessment of their informativeness and selection of features in the identification of a computer system state.

Industrial Data-Driven Monitoring Based on Incremental Learning Applied to the Detection of Novel Faults

A novel multifault detection and identification scheme, based on machine learning, information data-fusion, novelty-detection, and incremental learning is proposed, which is validated under a complete set of experimental scenarios from two different cases of study and compared with a classical approach.



Learning from Imbalanced Data

A critical review of the nature of the problem, the state-of-the-art technologies, and the current assessment metrics used to evaluate learning performance under the imbalanced learning scenario is provided.

Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study

The empirical results demonstrate that the overall predictive performance of MTDF and rules-generation based on genetic algorithms performed the best as compared with the rest of the evaluated oversampling methods and rule-generation algorithms.

A Novel Machine Learning Approach Toward Quality Assessment of Sensor Data

Experimental results reveal that the proposed ensemble classification framework agrees with expert judgement with high accuracy and achieves superior classification performance than other state-of-the-art approaches.

SVMs Modeling for Highly Imbalanced Classification

Of the four SVM variations considered in this paper, the novel granular SVMs-repetitive undersampling algorithm (GSVM-RU) is the best in terms of both effectiveness and efficiency.

Diversified Sensitivity-Based Undersampling for Imbalance Classification Problems

A diversified sensitivity-based undersampling method that yields a good generalization capability for 14 UCI datasets by iteratively clustering and sampling a balanced set of samples yielding high classifier sensitivity.

RUSBoost: A Hybrid Approach to Alleviating Class Imbalance

This paper presents a new hybrid sampling/boosting algorithm, called RUSBoost, for learning from skewed training data, which provides a simpler and faster alternative to SMOTEBoost, which is another algorithm that combines boosting and data sampling.

Dynamic Sampling Approach to Training Neural Networks for Multiclass Imbalance Classification

Results on 20 multiclass imbalanced data sets show that DyS can outperform the compared methods, including pre-sample methods, active learning methods, cost-sensitive methods, and boosting-type methods.

Multiclass Imbalance Problems: Analysis and Potential Solutions

  • Shuo WangX. Yao
  • Computer Science
    IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
  • 2012
AdaBoost.NC is shown to be better at recognizing minority class examples and balancing the performance among classes in terms of G-mean without using any class decomposition, and is applied to several real-world multiclass imbalance tasks and compared to other popular ensemble methods.

Using Class Imbalance Learning for Software Defect Prediction

This paper investigates different types of class imbalance learning methods, including resampling techniques, threshold moving, and ensemble algorithms, and concludes that AdaBoost.NC shows the best overall performance in terms of the measures including balance, G-mean, and Area Under the Curve (AUC).