Learn More
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation 1s very high. Also relevant are high-dimensional Euclidian settings in which the distribution of(More)
In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows(More)
This paper 1 describes PicHunter, an image retrieval system that implements a novel approach to relevance feedback, such that the entire history of user selections contributes to the system's estimate of the user's goal image. To accomplish this, PicHunter uses Bayesian learning based on a probabilistic model of a user's behavior. The predictions of this(More)
Natural learners rarely have access to perfectly labeled data { motivating the study of unsupervised learning in an attempt to assign labels. An alternative viewpoint, which avoids the issue of labels entirely , has as the learner's goal the discovery of an eeective metric with which similarity judgments can be made. We refer to this paradigm as metric(More)
We propose a self-organizing archival Intermem-ory. That is, a noncommercial subscriber-provided distributed information storage service built on the existing Internet. Given an assumption of continued growth in the memory's total size, a subscriber's participation for only a nite time can nevertheless ensure archival preservation of the subscriber's data.(More)
If DNA were a random string over its alphabet {A, C, G, T}, an optimal code would assign two bits to each nucleotide. DNA may be imagined to be a highly ordered, purposeful molecule, and one might therefore reasonably expect statistical models of its string representation to produce much lower entropy estimates. Surprisingly, this has not been the case for(More)
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to(More)
A new algorithm and systematic evaluation is presented for searching a database via relevance feedback. It represents a new image display strategy for the PicHunter system [2, 1]. The algorithm takes feedback in the form of relative judgments (" item A is more relevant than item B ") as opposed to the stronger assumption of categorical relevance judgments(More)
This paper addresses how the eeectiveness of a content-based, multimedia information retrieval system can be measured , and how such a system should best use response feedback in performing searches. We propose a simple, quan-tiiable measure of an image retrieval system's eeective-ness, \target testing", in which eeectiveness is measured as the average(More)
We describe psychophysical experiments conducted to study PicHunter, a content-based image retrieval (CBIR) system. Experiment 1 studies the importance of using (a) semantic information, (b) memory of earlier input and (c) relative, rather than absolute, judgements of image similarity. The target testing paradigm is used in which a user must search for an(More)