On the complexity of reverse similarity search

  • Matthew Skala
  • Published 2008 in
    2008 IEEE 24th International Conference on Data…

Abstract

Two decision problems are presented that arise from reversing the operation of a distance-based indexing tree. Whereas similarity search finds points in the tree given a query point, reverse similarity search begins with a set of constraints like those defining a leaf and generates a point meeting the constraints. These problems derive from robust hashing, a technique used in similarity search and security applications. The problems are analysed for spaces of strings and vectors with a variety of metrics: strings with Hamming distance; the usual (Levenshtein) edit distance; an edit distance we introduce called Superghost distance; arbitrary weighted tree metrics; and real vectors with Minkowski L<sub>P</sub> metrics (of which the Euclidean distance is a special case). They are found to inhabit different complexity classes depending on the metric. In particular, the reverse similarity search problem derived from a VP- or GH-tree is NP-complete for any L<sub>P</sub> metric except that it is in P for a GH-tree with the Euclidean metric.

DOI: 10.1109/ICDEW.2008.4498355

Extracted Key Phrases

Cite this paper

@article{Skala2008OnTC, title={On the complexity of reverse similarity search}, author={Matthew Skala}, journal={2008 IEEE 24th International Conference on Data Engineering Workshop}, year={2008}, pages={436-443} }