Corpus ID: 16326925

Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs?

@inproceedings{Weiss2007CostSensitiveLV,
  title={Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs?},
  author={G. Weiss and Kate McCarthy and Bibi Zabar},
  booktitle={DMIN},
  year={2007}
}
The classifier built from a data set with a highly skewed class distribution generally predicts the more frequently occurring classes much more often than the infrequently occurring classes. [...] Key Method The first method incorporates the misclassification costs into the learning algorithm while the other two methods employ oversampling or undersampling to make the training data more balanced. In this paper we empirically compare the effectiveness of these methods in order to determine which produces the…Expand
Cost-Based Sampling of Individual Instances
TLDR
A general sampling approach that assigns weights to individual instances according to the cost function helps reveal the relationship between classification performance and class ratios and allows the identification of an appropriate class distribution for which, the learning method achieves a reasonable performance on the data. Expand
A Comparative Study of Data Sampling and Cost Sensitive Learning
TLDR
This work investigates the performance of two cost sensitive learning techniques and four data sampling techniques for minimizing classification costs when data is imbalanced, and presents a comprehensive suite of experiments, utilizing 15 datasets with 10 cost ratios. Expand
A Monte Carlo study on methods for handling class imbalance
Many applications of classification problems in machine learning involve class imbalance—a situation where the class of interest (the “minority” or “positive” class) makes up a very small percentageExpand
Automatically countering imbalance and its empirical relationship to cost
TLDR
A wrapper paradigm is proposed that discovers the amount of re-sampling for a data set based on optimizing evaluation functions like the f-measure, Area Under the ROC Curve, cost, cost-curves, and the cost dependent f-measures to outperform the cost-sensitive classifiers in a cost- sensitive environment. Expand
Variance Ranking Attributes Selection Techniques for Binary Classification Problem in Imbalance Data
TLDR
A novel similarity measurement technique ranked order similarity-ROS is used to evaluate the variance ranking attribute selection compared to the Pearson correlations and information gain technique, and shows better results than the benchmarks. Expand
Cost-Sensitive Universum-SVM
  • S. Dhar, V. Cherkassky
  • Computer Science
  • 2012 11th International Conference on Machine Learning and Applications
  • 2012
TLDR
This paper extends the U-SVM for problems with different misclassification costs, and presents practical conditions for the effectiveness of the cost sensitive U- SVM. Expand
Minimax Modifications of Linear Discriminant Analysis for Classification with Rare Classes
TLDR
Cost-efficient modifications of Linear Discriminant Analysis are presented allowing to mitigate the problem of classification for imbalanced samples with rare classes by minimizing maximal classification error among the classes. Expand
An Optimized Cost-Sensitive SVM for Imbalanced Data Learning
TLDR
An effective wrapper framework incorporating the evaluation measure (AUC and G-mean) into the objective function of cost sensitive SVM directly to improve the performance of classification by simultaneously optimizing the best pair of feature subset, intrinsic parameters and misclassification cost parameters is presented. Expand
Undersampling Near Decision Boundary for Imbalance Problems
TLDR
A novel undersampling method, the UnderSampling using Sensitivity (USS), based on sensitivity of each majority example, which confirms the superiority of the USS against one baseline method and five resampling methods. Expand
The OCS-SVM: An Objective-Cost-Sensitive SVM With Sample-Based Misclassification Cost Invariance
TLDR
Inspired by the concept of the CS-SVM, a new SVM with sample-based misclassification cost invariance is proposed with the aim of constructing a relatively reliable classifier, which is defined as the one with low probabilities of finding a classifier that correctly classifies each misclassified sample. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 21 REFERENCES
C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling
TLDR
This paper shows that using C4.5 with undersampling establishes a reasonable standard for algorithmic comparison, and it is recommended that the cheapest class classifier be part of that standard as it can be better than under-sampling for relatively modest costs. Expand
C 4 . 5 , Class Imbalance , and Cost Sensitivity : Why Under-Sampling beats OverSampling
TLDR
This paper shows that using C4.5 with undersampling establishes a reasonable standard for algorithmic comparison, and it is recommended that the cheapest class classifier be part of that standard as it can be better than under-sampling for relatively modest costs. Expand
SMOTE: Synthetic Minority Over-sampling Technique
TLDR
A combination of the method of oversampling the minority (abnormal) class and under-sampling the majority class can achieve better classifier performance (in ROC space) and a combination of these methods and the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy is evaluated. Expand
C4.5 and Imbalanced Data sets: Investigating the eect of sampling method, probabilistic estimate, and decision tree structure
TLDR
This paper studies the quality of probabilistic estimates, pruning, and preprocessing the imbalanced data set by over or undersampling methods such that a fairly balanced training set is provided to the decision trees. Expand
Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction
TLDR
A "budget-sensitive" progressive sampling algorithm is introduced for selecting training examples based on the class associated with each example and it is shown that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance. Expand
The Foundations of Cost-Sensitive Learning
TLDR
It is argued that changing the balance of negative and positive training examples has little effect on the classifiers produced by standard Bayesian and decision tree learning methods, and the recommended way of applying one of these methods is to learn a classifier from the training set and then to compute optimal decisions explicitly using the probability estimates given by the classifier. Expand
Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown
The problem of learning from imbalanced data sets, while not the same problem as learning when misclassication costs are unequal and unknown, can be handled in a similar manner. That is, in bothExpand
An iterative method for multi-class cost-sensitive learning
TLDR
This paper empirically evaluates the performance of the proposed method using benchmark data sets and proves that the method generally achieves better results than representative methods for cost-sensitive learning, in terms of predictive performance (cost minimization) and, in many cases, computational efficiency. Expand
Improving classifier utility by altering the misclassification cost ratio
TLDR
By using a hold out set to identify the "best" cost ratio for learning, this paper is able to take advantage of this behavior and generate classifiers that outperform the accepted strategy of always using the actual cost information during the learning phase. Expand
The class imbalance problem: A systematic study
TLDR
The assumption that the class imbalance problem does not only affect decision tree systems but also affects other classification systems such as Neural Networks and Support Vector Machines is investigated. Expand
...
1
2
3
...