Peter N. Yianilos

Learn More
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation 1s very high. Also relevant are high-dimensional Euclidian settings in which the distribution of(More)
In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows(More)
This paper describes PicHunter, an image retrieval sys­ tem that implements a novel approach to relevance feedback. It uses Bayesian learning based on a probabilistic model of a user's behavior. The predictions of this model are com­ bined with the selections made during a search to choose the images to display. The details of our model were tuned using an(More)
If DNA were a random string over its alphabet {A, C, G, T}, an optimal code would assign two bits to each nucleotide. DNA may be imagined to be a highly ordered, purposeful molecule, and one might therefore reasonably expect statistical models of its string representation to produce much lower entropy estimates. Surprisingly, this has not been the case for(More)
Abstract A new algorithm and systematic evaluation is presented for searching a database via relevance feedback. It represents a new image display strategy for the PicHunter system [2, 1]. The algorithm takes feedback in the form of relative judgments (“item A is more relevant than item B”) as opposed to the stronger assumption of categorical relevance(More)
This paper addresses how the eeectiveness of a content-based, multimedia information retrieval system can be measured , and how such a system should best use response feedback in performing searches. We propose a simple, quan-tiiable measure of an image retrieval system's eeective-ness, \target testing", in which eeectiveness is measured as the average(More)
We consider the problem of feature-based face recognition in the setting where only a single example of each face is available for training. The mixture-distance technique we introduce achieves a recognition rate of 95% on a database of 685 people in which each face is represented by 30 measured distances. This is currently the best recorded recognition(More)
An Archival Intermemory solves the problem of highly survivable digital data storage in the spirit of the Internet. In this paper we describe a prototype implementation of Intermemory, including an overall system architecture and implementations of key system components. The result is a working Intermemory that tolerates up to 17 simultaneous node failures,(More)