Corpus ID: 43989363

How to Deal with Large Dataset, Class Imbalance and Binary Output in SVM based Response Model

@inproceedings{Shin2003HowTD,
  title={How to Deal with Large Dataset, Class Imbalance and Binary Output in SVM based Response Model},
  author={Hyunjung Shin},
  year={2003}
}
Support Vector Machine (SVM) employs Structural Risk Minimization (SRM) principle to generalize better than conventional machine learning methods employing the traditional Empirical Risk Minimization (ERM) principle. When applying SVM to response modeling in direct marketing, however, one has to deal with the practical difficulties: large training data, class imbalance and binary SVM output. This paper proposes ways to alleviate or solve the addressed difficulties through informative sampling… Expand
Kernel Machines for Imbalanced Data Problem in Biomedical Applications
TLDR
This chapter shows that a hybrid kernel machine comprising one-class SVMs and binary SVMs in a multi-classifier system alleviates the imbalanced data problem and reports the deployment of such hybrid kernel machines in two biomedical applications where the im balanced data problem exists. Expand
Multi-level SVM Based CAD Tool for Classifying Structural MRIs
TLDR
This work proposes a CAD tool for differentiating neural lesions caused by CVA from the lesions cause by other neural disorders by using Non-negative Matrix Factorisation (NMF) and Haralick features for feature extraction and SVM (Support Vector Machine) for pattern recognition. Expand
Support vector machine classification applied to the parametric design of centrifugal pumps
TLDR
Numerical tests show that the addition of this classification tool helps to reduce considerably the number of CFD computations required for the design, providing large savings in computational time. Expand
Influence Relevance Voting: An Accurate And Interpretable Virtual High Throughput Screening Method
TLDR
The Influence Relevance Voter is a low-parameter neural network which refines a k-nearest neighbor classifier by nonlinearly combining the influences of a chemical's neighbors in the training set, and presents several important advantages over SVMs and other methods. Expand
Estimation of nonparametric probability density functions with applications to automatic speech recognition
TLDR
This thesis investigates and develops methods to e ciently train sparse kernel PDF models by regression of the empirical cumulative distribution function (ECDF) and presents a novel machine learning software library, which demonstrates the new models' improved generalization ability on small-sample problems. Expand
Two Tier Prediction of Stroke Using Artificial Neural Networks and Support Vector Machines
TLDR
A two-tier system for predicting stroke; the first tier makes use of Artificial Neural Network (ANN) to predict the chances of a person suffering from stroke and the second tier uses Non-negative Matrix Factorization and Haralick Textural features for feature extraction and SVM classifier for classification. Expand
The Influence Relevance Voter : An Accurate And Interpretable Virtual High Throughput Screening Method
Given activity training data from Hight-Throughput Screening (HTS) experiments, virtual High-Throughput Screening (vHTS) methods aim to predict in silico the activity of untested chemicals. WeExpand
Supervised learning for infection risk inference using pathology data
TLDR
Six biochemical markers comprise enough information to perform infection risk inference with a high degree of confidence even in the presence of incomplete and imbalanced data, and are commonly available in hospitals. Expand
Chapter 6 Virtual High-Throughput Screening with Two-Dimensional Kernels
High-Throughput Screening (HTS) is an important technology that relies on massively testing large numbers of compounds for their activity on a given assay in order to identify potential drug leads inExpand
A Machine Learning System for Understanding Appraisal in Design Documents
TLDR
This thesis describes a machine learning system for understanding appraisal in design documents that expresses the positive or negative stance of the author to the semantic meaning of the document. Expand
...
1
2
...

References

SHOWING 1-10 OF 29 REFERENCES
Pattern Selection for Support Vector Classifiers
TLDR
A k-nearest neighbors (k-NN) based pattern selection method that tries to select the patterns that are near the decision boundary and that are correctly labeled to reduce training time of redundant SVs. Expand
Fast Pattern Selection for Support Vector Classifiers
TLDR
A k-nearest neighbors (k-NN) based pattern selection method that tries to select the patterns that are near the decision boundary and that are correctly labeled to reduce training time of redundant SVs. Expand
How many neighbors to consider in pattern pre-selection for support vector classifiers?
  • Hyunjung Shin, S. Cho
  • Mathematics
  • Proceedings of the International Joint Conference on Neural Networks, 2003.
  • 2003
Training support vector classifiers (SVC) requires large memory and long cpu time when the pattern set is large. To alleviate the computational burden in SVC training, we previously proposed aExpand
Classification of unbalanced data with transparent kernels
TLDR
The aim is to build data driven classifiers that provide good predictive performance for a set of unbalanced data and enhance the understanding of a model by enabling input/output dependencies that exist to be visualised. Expand
Sample selection via clustering to construct support vector-like classifiers
TLDR
Simulation results for well-known classification problems show very good performance of the corresponding designs, improving that of support vector machines and reducing substantially their number of units, which shows that interest in selecting samples (or centroids) in an efficient manner is justified. Expand
Learning from Imbalanced Data Sets: A Comparison of Various Strategies *
Although the majority of concept-learning systems previously designed usually assume that their training sets are well-balanced, this assumption is not necessarily correct. Indeed, there exists manyExpand
Knowledge discovery in a direct marketing case using least squares support vector machines
TLDR
This study investigates the detection and qualification of the most relevant explanatory variables for predicting purchase incidence using Belgian data, and extends beyond the standard recency frequency monetary modeling semantics by including alternative operationalizations of the RFM variables, and by adding several other (non‐RFM) predictors. Expand
SVM-KM: speeding SVMs learning with a priori cluster selection and k-means
TLDR
The number of vectors in a SVM training is smaller and the training time can be decreased without compromising the generalization capability of the SVM. Expand
The training of neural classifiers with condensed datasets
TLDR
A k-nearest-neighbor-based data condensing algorithm is applied to the training set of multilayer perceptron neural networks to significantly speed the network training time while achieving an undegraded misclassification rate compared to a network trained on the unedited training set. Expand
Response models based on bagging neural networks
TLDR
A systematic method of combining neural networks is proposed, namely bagging or bootstrap aggregating, whereby overfitted multiple neural networks are trained with bootstrap replicas of the original data set and then averaged. Expand
...
1
2
3
...