# Feature Selection via Concave Minimization and Support Vector Machines

@inproceedings{Bradley1998FeatureSV, title={Feature Selection via Concave Minimization and Support Vector Machines}, author={Paul S. Bradley and Olvi L. Mangasarian}, booktitle={ICML}, year={1998} }

Computational comparison is made between two feature selection approaches for nding a separating plane that discriminates between two point sets in an n-dimensional feature space that utilizes as few of the n features (dimensions) as possible. In the concave minimization approach [19, 5] a separating plane is generated by minimizing a weighted sum of distances of misclassi ed points to two parallel planes that bound the sets and which determine the separating plane midway between them…

## 1,063 Citations

Generalized Support Vector Machines

- Computer Science
- 1998

By setting apart the two functions of a support vector machine: separation of points by a nonlinear surface in the original space of patterns, and maximizing the distance between separating planes in…

Data selection for support vector machine classifiers

- Computer ScienceKDD '00
- 2000

The proposed approach incorporates a feature selection procedure that results in a minimal number of input features used by the classifier, which makes MSVM a useful incremental classification tool which maintains only a small fraction of a large dataset before merging and processing it with new incoming data.

Mathematical programming approaches to machine learning and data mining

- Computer Science
- 1998

The feature selection approach via concave minimization computes a separating-plane based classifier that improves upon the generalization ability of a separating plane computed without feature suppression, support the claim that mathematical programming is effective as the basis of data mining tools to extract patterns from a database which contain “knowledge” and thus achieve “ knowledge discovery in databases”.

Feature Selection for Nonlinear Kernel Support Vector Machines

- Computer ScienceSeventh IEEE International Conference on Data Mining Workshops (ICDMW 2007)
- 2007

An easily implementable mixed-integer algorithm is pro- posed that generates a nonlinear kernel support vector ma- chine (SVM) classifier with reduced input space features and improves theacy of a full-feature classifier by over 30%.

Integrated classifier hyperplane placement and feature selection

- MathematicsExpert Syst. Appl.
- 2012

Semi-superyised support vector machines for unlabeled data classification

- Computer Science
- 2001

Computational results show that clustered concave minimization yields test set improvement as high as 20.4% over a linear support vector machine trained on a correspondingly small but randomly chosen subset that is labeled by an expert.

Support vector machine classification via parameterless robust linear programming

- Computer ScienceOptim. Methods Softw.
- 2005

It is shown that the problem of minimizing the sum of arbitrary-norm real distances to misclassified points, from a pair of parallel bounding planes of a classification problem, leads to a simple parameterless linear program.

Minimal Kernel Classifiers

- Computer ScienceJ. Mach. Learn. Res.
- 2002

A finite concave minimization algorithm is proposed for constructing kernel classifiers that use a minimal number of data points both in generating and characterizing a classifier and results in a much faster classifier that requires less storage.

Feature selection combining linear support vector machines and concave optimization

- Computer ScienceOptim. Methods Softw.
- 2010

This work proposes a feature selection strategy based on the combination of support vector machines (for obtaining good classifiers) with a concave optimization approach (for finding sparse solutions) and reports results of an extensive computational experience showing the efficiency of the proposed methodology.

Benchmarking Least Squares Support Vector Machine Classifiers

- Computer ScienceMachine Learning
- 2004

Both the SVM and LS-SVM classifier with RBF kernel in combination with standard cross-validation procedures for hyperparameter selection achieve comparable test set performances, consistently very good when compared to a variety of methods described in the literature.

## References

SHOWING 1-10 OF 33 REFERENCES

Robust linear programming discrimination of two linearly inseparable sets

- Computer Science
- 1992

A single linear programming formulation is proposed which generates a plane that of minimizes an average sum of misclassified points belonging to two disjoint points sets in n-dimensional real space, without the imposition of extraneous normalization constraints that inevitably fail to handle certain cases.

An Equivalence Between Sparse Approximation and Support Vector Machines

- Computer ScienceNeural Computation
- 1998

If the data are noiseless, the modified version of basis pursuit denoising proposed in this article is equivalent to SVM in the following sense: if applied to the same data set, the two techniques give the same solution, which is obtained by solving the same quadratic programming problem.

Toward Optimal Feature Selection

- Computer ScienceICML
- 1996

An efficient algorithm for feature selection which computes an approximation to the optimal feature selection criterion is given, showing that the algorithm effectively handles datasets with a very large number of features.

A support vector machine approach to decision trees

- Computer Science1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227)
- 1998

The "optimal" decision tree is characterized, and both a primal and dual space formulation for constructing the tree are proposed and the result is a method for generating logically simple decision trees with multivariate linear, nonlinear or linear decisions.

Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms

- Computer ScienceNeural Computation
- 1998

This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task and measures the power (ability to detect algorithm differences when they do exist) of these tests.

Parsimonious Least Norm Approximation

- MathematicsComput. Optim. Appl.
- 1998

Numerical tests on a signal-processing-based example indicate that the proposed method is comparable to a method that parametrically minimizes the 1-norm of the solution x and the error ‖Ax-b-p‖1, and that both methods are superior, by orders of magnitude, to solutions obtained by least squares.

Readings in Machine Learning

- Computer Science
- 1991

Readings in Machine Learning collects the best of the published machine learning literature, including papers that address a wide range of learning tasks, and that introduce a variety of techniques for giving machines the ability to learn.

Introduction to the theory of neural computation

- Computer ScienceThe advanced book program
- 1991

This book is a detailed, logically-developed treatment that covers the theory and uses of collective computational networks, including associative memory, feed forward networks, and unsupervised learning.