Fisher Discriminant Analysis with Kernels

  • Sebastian Mikat, Jason Weston, Bernhard Scholkopft
  • Published 1999


A non-linear classification technique based on Fisher9s discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) non-linear decision function in input space. Large scale simulations demonstrate the competitiveness of our approach. DISCRIMINANT ANALYSIS In classification and other data analytic tasks it is often necessary to utilize pre-processing on the data before applying the algorithm at hand and it is common to first extract features suitable for the task to solve. Feature extraction for classification differs significantly from feature extraction for describing data. For example PCA finds directions which have minimal reconstruction error by describing as much variance of the data as possible with m orthogonal directions. Considering the first directions they need not (and in practice often will not) reveal the class structure that we need for proper classification. Discriminant analysis addresses the following question: Given a data set with two classes, say, which is the best feature or feature set (either linear or non-linear) to discriminate the two classes? Classical approaches tackle this question by starting with the (theoretically) optimal Bayes classifier and, by assuming normal distributions for the classes, standard algorithms like quadratic or linear discriminant analysis, among them the famous Fisher discriminant, can be derived (e.g. [5]). Of course any other model different from a Gaussian for the class distributions could be assumed, this, however, often sacrifices the simple closed form solution. Several modifications towards more general features have been proposed (e.g. [SI); for an introduction and review on existing methods see e.g. [3, 5, 8, 111. In this work we propose to use the kernel idea [l], originally applied in Support Vector Machines [19, 14]), kernel PCA [16] and other kernel based algorithms (cf. [14]) to define a non-linear generalization of Fisher’s discriminant. Our method uses kernel feature spaces yielding a highly flexible 0-7803-5673-X/99/$10.00

Extracted Key Phrases

2 Figures and Tables

Citations per Year

2,078 Citations

Semantic Scholar estimates that this publication has 2,078 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Mikat1999FisherDA, title={Fisher Discriminant Analysis with Kernels}, author={Sebastian Mikat and Jason Weston and Bernhard Scholkopft}, year={1999} }