Evaluating boosting algorithms to classify rare classes: comparison and improvements

  title={Evaluating boosting algorithms to classify rare classes: comparison and improvements},
  author={Mahesh V. Joshi and Vipin Kumar and Ramesh C. Agarwal},
  journal={Proceedings 2001 IEEE International Conference on Data Mining},
Classification of rare events has many important data mining applications. [...] Key Method We propose enhanced algorithms in two of the categories, and justify their choice of weight updating parameters theoretically. Using some specially designed synthetic datasets, we compare the capability of all the algorithms from the rare class perspective. The results support our qualitative analysis, and also indicate that our enhancements bring an extra capability for achieving better balance between recall and…Expand
Predicting rare classes: can boosting make any weak learner strong?
This analysis indicates that one cannot be blind to the base learner performance, and just rely on the boosting mechanism to take care of its weakness, and validate the arguments empirically on variety of real and synthetic rare class problems. Expand
Multiple Classifier Prediction Improvements against Imbalanced Datasets through Added Synthetic Examples
The experimental results indicate that the Databoost-IM algorithm surpasses a benchmarking individual classifier as well as a popular boosting method, when evaluated in terms of the overall accuracy, the G-mean and the F-measures. Expand
Local decomposition for rare class analysis
A method for Classification using lOcal clusterinG (COG), which produces significantly higher prediction accuracies on rare classes than state-of-the-art methods and can also improve the performance of traditional supervised learning algorithms on data sets with balanced class distributions. Expand
Boosting methods for multi-class imbalanced data classification: an experimental review
The experimental studies show that the CatBoost and LogitBoost algorithms are superior to other boosting algorithms on multi- class imbalanced conventional and big datasets, respectively and the MMCC is a better evaluation metric than the MAUC and G-mean in multi-class imbalanced data domains. Expand
Predicting Rare Classes: Comparing Two-Phase Rule Induction to Cost-Sensitive Boosting
This paper qualitatively argues that this ability to identify the relevant false positives is not guaranteed by the boosting methodology, and simulates learning scenarios of varying difficulty to demonstrate that this fundamental qualitative difference in the two mechanisms results in existence of many scenarios in which PNrule achieves comparable or significantly better performance than AdaCost, a strong cost-sensitive boosting algorithm. Expand
Cost-sensitive boosting for classification of imbalanced data
Three cost-sensitive boosting algorithms are developed by introducing cost items into the learning framework of AdaBoost, which show that one of the proposed algorithms tallies with the stagewise additive modelling in statistics to minimize the cost exponential loss. Expand
Improving classification performance for the minority class in highly imbalanced dataset using boosting
Data imbalance is a common property in many medical and biological data and usually results in degraded generalization performance. In this article, we present a novel boosting method to address twoExpand
Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms
An optimization model using different swarm strategies (Bat-inspired algorithm and PSO) is proposed for adaptively balancing the increase/decrease of the class distribution, depending on the properties of the biological datasets, and it outperforms other class balancing methods in medical data classification. Expand
Imbalance class problems in data mining: a review
A comprehensive survey is performed to identify the challenges of handling imbalanced class problems during classification process using machine learning algorithms and the viable solutions and potential future directions are provided to handle the problems. Expand
An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics
This work carries out a thorough discussion on the main issues related to using data intrinsic characteristics in this classification problem, and introduces several approaches and recommendations to address these problems in conjunction with imbalanced data. Expand


Mining needle in a haystack: classifying rare classes via two-phase rule induction
This paper designs various synthetic data models to identify and analyze the situations in which two state-of-the-art methods, RIPPER and C4.5 rules, either fail to learn a model or learn a very poor model, and learns a model with significantly better recall and precision levels. Expand
Improved Boosting Algorithms using Confidence-Rated Predictions
We describe several improvements to Freund and Schapire‘s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give aExpand
Improved Boosting Algorithms Using Confidence-rated Predictions
We describe several improvements to Freund and Schapire's AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give aExpand
An improved boosting algorithm and its application to text categorization
An improved boosting algorithm, called {\sc AdaBoost.MH$^{KR}$}, is described, and its application to text categorization is described and shown to be both more efficient to train and more effective than the original Ada boost.MH algorithm. Expand
Theoretical Views of Boosting
Focusing primarily on the AdaBoost algorithm, theoretical work on boosting is surveyed including analyses of AdaBoost's training error and generalization error, connections between boosting and game theory, methods of estimating probabilities using boosting, and extensions of Ada boost for multiclass classification problems. Expand
Fast Effective Rule Induction
This paper evaluates the recently-proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems, and proposes a number of modifications resulting in an algorithm RIPPERk that is very competitive with C4.5 and C 4.5rules with respect to error rates, but much more efficient on large samples. Expand
A simple, fast, and effective rule learner
We describe SLIPPER, a new rule learner that generates rulesets by repeatedly boosting a simple, greedy, rule-builder. Like the rulesets built by other rule learners, the ensemble of rules created byExpand
Toward Scalable Learning with Non-Uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection
A multi-classifier meta-learning approach to address very large databases with skewed class distributions and non-uniform cost per error and empirical results indicate that the approach can significantly reduce loss due to illegitimate transactions. Expand
AdaCost: Misclassification Cost-Sensitive Boosting
It is formally show that AdaCost reduces the upper bound of cumulative misclassification cost of the training set, which is significant reduction in the cumulative mis classification cost over AdaBoost without consuming additional computing power. Expand
A Simple
In this short note, we demonstrate a simple and practical ORAM that enjoys an extremely simple proof of security. Our construction is based on a recent ORAM due to Shi, Chan, Stefanov and LiExpand