Estimating the Support of a High-Dimensional Distribution

  title={Estimating the Support of a High-Dimensional Distribution},
  author={Bernhard Sch{\"o}lkopf and John C. Platt and John Shawe-Taylor and Alex Smola and Robert C. Williamson},
  journal={Neural Computation},
Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a simple subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. [] Key Method The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space.

Support Vector Method for Novelty Detection

The algorithm is a natural extension of the support vector algorithm to the case of unlabelled data and is regularized by controlling the length of the weight vector in an associated feature space.

Learning from positive and unlabeled examples by enforcing statistical significance

This work formalizes the problem of characterizing the positive class as a problem of learning a feature based score function that minimizes the p-value of a non parametric statistical hypothesis test and provides a solution of this problem computed by a one-class SVM applied on a surrogate dataset obtained.

1 Covariate Shift by Kernel Mean Matching

This paper solves the problem of re-weighting the training data such that its distribution more closely matches that of the test data by matching covariate distributions between training and test sets in a high dimensional feature space (specifically, a reproducing kernel Hilbert space).

Exact rates in density support estimation

Support Measure Data Description

This work addresses the problem of learning a data description model for a dataset whose elements or observations are itself a set of points in R D by computing a minimum volume set for the probability measures by means of a minimum enclosing ball of the representer functions in a Reproducing Kernel Hilbert Space (RKHS).

Spectral Regularization for Support Estimation

A new class of regularized spectral estimators based on a new notion of reproducing kernel Hilbert space, which is called "completely regular", which allows to capture the relevant geometric and topological properties of an arbitrary probability space.

Support Distribution Machines

The projection of the estimated Gram matrix to the cone of semi-definite matrices enables us to employ the kernel trick, and hence use kernel machines for classification, regression, anomaly detection, and low-dimensional embedding in the space of distributions.

The One Class Support Vector Machine Solution Path

  • Gyemin LeeC. Scott
  • Computer Science
    2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07
  • 2007
A heuristic for enforced nestedness of the sets in the path is introduced, and a method for kernel bandwidth selection based in minimum integrated volume, a kind of AUC criterion is presented.

A Kernel Two-Sample Test

This work proposes a framework for analyzing and comparing distributions, which is used to construct statistical tests to determine if two samples are drawn from different distributions, and presents two distribution free tests based on large deviation bounds for the maximum mean discrepancy (MMD).

Learning Minimum Volume Sets with Support Vector Machines

The reduction of MV-set estimation to NP classification to Neyman-Pearson classification is described, improved methods for generating artificial uniform data for the two-class approach are devised, and a new performance measure is advocated for systematic comparison of MV -set algorithms is advocated.



On nonparametric estimation of density level sets

Let X 1 ,...,X n be independent identically distributed observations from an unknown probability density f(.). Consider the problem of estimating the level set G = G f (λ) = {x ∈ R 2 : f(x) ≥ λ} from

Generalization Performance of Classifiers in Terms of Observed Covering Numbers

It is shown that one can utilize an analogous argument in terms of the observed covering numbers on a single m-sample (being the actual observed data points) to bound the generalization performance of a classifier by using a margin based analysis.

Detection of Abnormal Behavior Via Nonparametric Estimation of the Support

In this paper two problems are considered, both involving the nonparametric estimation of the support of a random vector from a sequence of independent identically distributed observations. In the

Structural Risk Minimization Over Data-Dependent Hierarchies

A result is presented that allows one to trade off errors on the training sample against improved generalization performance, and a more general result in terms of "luckiness" functions, which provides a quite general way for exploiting serendipitous simplicity in observed data to obtain better prediction accuracy from small training sets.

Margin Distribution Bounds on Generalization

It is shown that a slight generalization of their construction can be used to give a pac style bound on the tail of the distribution of the generalization errors that arise from a given sample size.

Entropy Numbers, Operators and Support Vector Kernels

New bounds for the generalization error of feature space machines, such as support vector machines and related regularization networks, are derived by obtaining new bounds on their covering numbers by virtue of the eigenvalues of an integral operator induced by the kernel function used by the machine.

Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators

New bounds for the generalization error of kernel machines, such as support vector machines and related regularization networks, are derived by obtaining new bounds on their covering numbers by using the eigenvalues of an integral operator induced by the kernel function used by the machine.

Kernel method for percentile feature extraction

A method is proposed which computes a direction in a dataset such that a speci ed fraction of a particular class of all examples is separated from the overall mean by a maximal margin, and this method can be thought of as a robust form of principal component analysis, where instead of variance the authors maximize percentile thresholds.

Support vector learning

This book provides a comprehensive analysis of what can be done using Support vector Machines, achieving record results in real-life pattern recognition problems, and proposes a new form of nonlinear Principal Component Analysis using Support Vector kernel techniques, which it is considered as the most natural and elegant way for generalization of classical Principal Component analysis.

Extracting Support Data for a Given Task

It is observed that three different types of handwritten digit classifiers construct their decision surface from strongly overlapping small subsets of the data base, which opens up the possibility of compressing data bases significantly by disposing of theData which is not important for the solution of a given task.