We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation 1s very high. Also relevant are high-dimensional Euclidian settings in which the distribution of… (More)

In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows… (More)

This paper 1 describes PicHunter, an image retrieval system that implements a novel approach to relevance feedback, such that the entire history of user selections contributes to the system's estimate of the user's goal image. To accomplish this, PicHunter uses Bayesian learning based on a probabilistic model of a user's behavior. The predictions of this… (More)

- Peter N Yianilos
- 1995

Natural learners rarely have access to perfectly labeled data { motivating the study of unsupervised learning in an attempt to assign labels. An alternative viewpoint, which avoids the issue of labels entirely , has as the learner's goal the discovery of an eeective metric with which similarity judgments can be made. We refer to this paradigm as metric… (More)

We propose a self-organizing archival Intermem-ory. That is, a noncommercial subscriber-provided distributed information storage service built on the existing Internet. Given an assumption of continued growth in the memory's total size, a subscriber's participation for only a nite time can nevertheless ensure archival preservation of the subscriber's data.… (More)

If DNA were a random string over its alphabet {A, C, G, T}, an optimal code would assign two bits to each nucleotide. DNA may be imagined to be a highly ordered, purposeful molecule, and one might therefore reasonably expect statistical models of its string representation to produce much lower entropy estimates. Surprisingly, this has not been the case for… (More)

A new algorithm and systematic evaluation is presented for searching a database via relevance feedback. It represents a new image display strategy for the PicHunter system [2, 1]. The algorithm takes feedback in the form of relative judgments (" item A is more relevant than item B ") as opposed to the stronger assumption of categorical relevance judgments… (More)

This paper addresses how the eeectiveness of a content-based, multimedia information retrieval system can be measured , and how such a system should best use response feedback in performing searches. We propose a simple, quan-tiiable measure of an image retrieval system's eeective-ness, \target testing", in which eeectiveness is measured as the average… (More)

The Bayesian relevance-feedback approach introduced with the PicHunter system 5] is extended to include hidden semantic attributes. The general approach is motivated and experimental results are presented that demonstrate signiicant reductions in search times (28-32%) using these annotations.