Learn More
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation 1s very high. Also relevant are high-dimensional Euclidian settings in which the distribution of(More)
In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows(More)
This paper 1 describes PicHunter, an image retrieval system that implements a novel approach to relevance feedback, such that the entire history of user selections contributes to the system's estimate of the user's goal image. To accomplish this, PicHunter uses Bayesian learning based on a probabilistic model of a user's behavior. The predictions of this(More)
Natural learners rarely have access to perfectly labeled data { motivating the study of unsupervised learning in an attempt to assign labels. An alternative viewpoint, which avoids the issue of labels entirely , has as the learner's goal the discovery of an eeective metric with which similarity judgments can be made. We refer to this paradigm as metric(More)
We propose a self-organizing archival Intermem-ory. That is, a noncommercial subscriber-provided distributed information storage service built on the existing Internet. Given an assumption of continued growth in the memory's total size, a subscriber's participation for only a nite time can nevertheless ensure archival preservation of the subscriber's data.(More)
If DNA were a random string over its alphabet {A, C, G, T}, an optimal code would assign two bits to each nucleotide. DNA may be imagined to be a highly ordered, purposeful molecule, and one might therefore reasonably expect statistical models of its string representation to produce much lower entropy estimates. Surprisingly, this has not been the case for(More)
We consider the problem of feature-based face recognition in the setting where only a single example of each face is available for training. The mixture-distance technique we introduce achieves a recognition rate of 95% on a database of 685 people in which each face is represented by 30 measured distances. This is currently the best recorded recognition(More)
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to(More)