Learning concepts from large scale imbalanced data sets using support cluster machines


This paper considers the problem of using Support Vector Machines (SVMs) to learn concepts from large scale imbalanced data sets. The objective of this paper is twofold. Firstly, we investigate the effects of large scale and imbalance on SVMs. We highlight the role of linear non-separability in this problem. Secondly, we develop a both practical and theoretical guaranteed meta-algorithm to handle the trouble of scale and imbalance. The approach is named Support Cluster Machines (SCMs). It incorporates the <i>informative</i> and the <i>representative</i> under-sampling mechanisms to speedup the training procedure. The SCMs differs from the previous similar ideas in two ways, (a) the theoretical foundation has been provided, and (b) the clustering is performed in the feature space rather than in the input space. The theoretical analysis not only provides justification, but also guides the technical choices of the proposed approach. Finally, experiments on both the synthetic and the TRECVID data are carried out. The results support the previous analysis and show that the SCMs are efficient and effective while dealing with large scale imbalanced data sets.

DOI: 10.1145/1180639.1180729

Extracted Key Phrases

9 Figures and Tables

Citations per Year

95 Citations

Semantic Scholar estimates that this publication has 95 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Yuan2006LearningCF, title={Learning concepts from large scale imbalanced data sets using support cluster machines}, author={Jinhui Yuan and Jianmin Li and Bo Zhang}, booktitle={ACM Multimedia}, year={2006} }