• Corpus ID: 15598582

CHAPTER 2 FOUNDATIONS OF IMBALANCED LEARNING

@inproceedings{Weiss2012CHAPTER2F,
  title={CHAPTER 2 FOUNDATIONS OF IMBALANCED LEARNING},
  author={G. Weiss},
  year={2012}
}
Many important learning problems, from a wide variety of domains, involve learning from imbalanced data. Because this learning task is quite challenging, there has been a tremendous amount of research on this topic over the past fifteen years. However, much of this research has focused on methods for dealing with imbalanced data, without discussing exactly how or why such methods work—or what underlying issues they address. This is a significant oversight, which this chapter helps to address… 

Figures from this paper

References

SHOWING 1-10 OF 52 REFERENCES
Extreme re-balancing for SVMs: a case study
TLDR
There is a consistent pattern of performance differences between one and two-class learning for all SVMs investigated, and these patterns persist even with aggressive dimensionality reduction through automated feature selection.
Class imbalances versus small disjuncts
TLDR
It is argued that, in order to improve classifier performance, it may be more useful to focus on the small disjuncts problem than it is tofocus on the class imbalance problem, and experiments suggest that the problem is not directly caused by class imbalances, but rather, that class imbalance may yield small disJuncts which will cause degradation.
Concept-Learning in the Presence of Between-Class and Within-Class Imbalances
TLDR
random re-sampling is extended in order to deal simultaneously with the between-class imbalance problem and addressing both problems simultaneously is beneficial and should be done by more sophisticated techniques as well.
Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction
TLDR
A "budget-sensitive" progressive sampling algorithm is introduced for selecting training examples based on the class associated with each example and it is shown that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance.
Concept Learning and the Problem of Small Disjuncts
TLDR
Various approaches to this problem are evaluated, including the novel approach of using a bias different than the "maximum generality" bias, which prove partly successful, but the problem of small disjuncts remains open.
SMOTE: Synthetic Minority Over-sampling Technique
TLDR
A combination of the method of oversampling the minority (abnormal) class and under-sampling the majority class can achieve better classifier performance (in ROC space) and a combination of these methods and the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy is evaluated.
SMOTEBoost: Improving Prediction of the Minority Class in Boosting
TLDR
This paper presents a novel approach for learning from imbalanced data sets, based on a combination of the SMOTE algorithm and the boosting procedure, which shows improvement in prediction performance on the minority class and overall improved F-values.
When small disjuncts abound, try lazy learning: A case study
TLDR
The measures introduced in this paper are useful for predicting the suitedness of lazy learning in general and are illustrated by a sample language task, viz. word pronunciation.
A Quantitative Study of Small Disjuncts
TLDR
This paper presents a quantitative measure for evaluating the effect of small disjuncts on learning and uses it to analyze 30 benchmark datasets and comes up with several interesting results.
Evolutionary Computation
  • A. Freitas
  • Computer Science
    Encyclopedia of Machine Learning
  • 2010
TLDR
This chapter addresses the integration of knowledge discovery in databases (KDD) and evolutionary algorithms (EAs) and suggests that this principle should be followed in other EA applications.
...
1
2
3
4
5
...