Kristin P. Bennett

Learn More
We examine linear program (LP) approaches to boosting and demonstrate their efficient solution using LPBoost, a column generation based simplex method. We formulate the problem as if all possible weak hypotheses had already been generated. The labels produced by the weak hypotheses become the new feature space of the problem. The boosting task becomes to(More)
We introduce a semi-supervised support vector machine (S3yM) method. Given a training set of labeled data and a working set of unlabeled data, S3YM constructs a support vector machine using both the training and working sets. We use S3YM to solve the transduction problem using overall risk minimization (ORM) posed by Yapnik. The transduction problem is to(More)
We describe a methodology for performing variable ranking and selection using support vector machines (SVMs). The method constructs a series of sparse linear SVMs to generate linear models that can generalize well, and uses a subset of nonzero weighted variables found by the linear models to produce a final nonlinear model. The method exploits the fact that(More)
Support Vector Machines (SVMs) and related kernel methods have become increasingly popular tools for data mining tasks such as classification, regression, and novelty detection. The goal of this tutorial is to provide an intuitive explanation of SVMs from a geometric perspective. The classification problem is used to investigate the basic concepts behind(More)
We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how two-class discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programming (QP) approaches based on Vapnik’s Support Vector(More)
An adaptive semi-supervised ensemble method, ASSEMBLE, is proposed that constructs classification ensembles based on both labeled and unlabeled data. ASSEMBLE alternates between assigning "pseudo-classes" to the unlabeled data using the existing ensemble and constructing the next base classifier using both the labeled and pseudolabeled data. Mathematically,(More)
Receiver Operating Characteristic (ROC) curves provide a powerful tool for visualizing and comparing classification results. Regression Error Characteristic (REC) curves generalize ROC curves to regression. REC curves plot the error tolerance on the xaxis versus the percentage of points predicted within the tolerance on the y-axis. The resulting curve(More)
We consider the problem of performing Nearest-neighbor queries efficiently over large high-dimensional databases. To avoid a full database scan, we target constructing a multidimensional index structure. It is well-accepted that traditional database indexing algorithms fail for high-dimensional data (say d > 10 or 20 depending on the scheme). Some arguments(More)