Trading Quality for Time with Nearest Neighbor Search

  title={Trading Quality for Time with Nearest Neighbor Search},
  author={Roger Weber and Klemens B{\"o}hm},
In many situations, users would readily accept an approximate query result if evaluation of the query becomes faster. In this article, we investigate approximate evaluation techniques based on the VA-File for Nearest-Neighbor Search (NN-Search). The VA-File contains approximations of feature points. These approximations frequently suffice to eliminate the vast majority of points in a first phase. Then, a second phase identifies the NN by computing exact distances of all remaining points. To… 
Approximate searches: k-neighbors + precision
This paper describes an approximate search scheme for high-dimensional databases where the precision of the search can be probabilistically controlled when retrieving the k NNs of query points and allows a fine and intuitive control over this precision by setting at run time the maximum probability for a vector in the exact answer set to be missed in the approximate set of answers eventually returned.
Approximate nearest neighbor searching in multimedia databases
This work proposes modifications to well-known techniques to support the progressive processing of approximate nearest-neighbor queries and develops a new technique based on clustering that merges the benefits of the two general classes of approaches.
A Performance-guaranteed approximate range query algorithm for the ND-tree
This paper proposes an approximate range query algorithm for the NDtree, a multi-dimensional index for vectors with nonordered discrete components, and proposes a novel volumebased weighting scheme for the priority queue.
VA-Files vs. R*-Trees in Distance Join Queries
The elaborate on VA-files is elaborate and VA-file based algorithms for answering similarity join and K closest pairs queries on high-dimensional data are developed and compared.
The Quality vs. Time Trade-off for Approximate Image Descriptor Search
Using a large collection of 5 million 24-dimensions local descriptors computed over more than 50 thousand real life images, it is shown that minimizing the query processing time may in fact lead to better quality of the intermediate results.
High dimensional nearest neighbor searching
MI-File: using inverted files for scalable approximate similarity search
A new efficient and accurate technique for generic approximate similarity searching, based on the use of inverted files, that enables us to use inverted files to obtain very efficiently a very small set of good candidates for the query result.
Exploring Bit-Difference for Approximate KNN Search in High-dimensional Databases
A novel index structure using BIt-Difference using clustering, cluster adapted bitcoder and dimensional weight to support efficient approximate k-nearest neighbor (KNN) query in high-dimensional databases is developed.
Fast Evaluation Techniques for Complex Similarity Queries
A new evaluation technique called Generalized VA-File-based Search (GeVAS) is described, which builds on the VA- File, supports queries over several feature types, and borrows the idea to search an index structure with several query objects in parallel from Ciaccia et al.
Clustering-based Approximate Answering of Query Result in Large and Distributed Databases
An efficient and effective algorithm coined Explore-Select-Rearrange Algorithm (ESRA), based on the SAINTETIQ model, to quickly provide users with hierarchical clustering schemas of their query results, and new algorithms for merging them into a single final one (global model).


Similarity Search in High Dimensions via Hashing
Experimental results indicate that the novel scheme for approximate similarity search based on hashing scales well even for a relatively large number of dimensions, and provides experimental evidence that the method gives improvement in running time over other methods for searching in highdimensional spaces based on hierarchical tree decomposition.
A cost model for similarity queries in metric spaces
This work insists that the distance distribution of objects can be profitably used to solve the problem of estimating CPU and I/O costs for processing range and k-nearest neighbors queries over metric spaces, and develops a concrete cost model for the M-tree access method.
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.
Fast approximate answers to aggregate queries on a data cube
  • V. Poosala, Venkatesh Ganti
  • Computer Science
    Proceedings. Eleventh International Conference on Scientific and Statistical Database Management
  • 1999
This paper precompute concise histogram statistics on the data to answer the queries quickly but approximately and proposes the use of multiple histograms to approximate the data cube and answer aggregate queries approximately using this summarized data.
Fast parallel similarity search in multimedia databases
This paper presents a new parallel method for fast nearest-neighbor search in high-dimensional feature spaces, which provides an almost linear speed-up and a constant scale-up, and outperforms the Hilbert approach by a factor of up to 5.
Join synopses for approximate query answering
This paper proposes join synopses as an effective solution for this problem and shows how precomputing just one join synopsis for each relation suffices to significantly improve the quality of approximate answers for arbitrary queries with foreign key joins.
Interactive-Time Similarity Search for Large Image Collections Using Parallel VA-Files
This work parallelizes NN-search based on the VA-File in a Network of Workstations (NOW) based on a linear algorithm that works with approximations of the vectors and parallelizes it, and reduces search time to a reasonable level for large collections.
The SR-tree: an index structure for high-dimensional nearest neighbor queries
This paper proposes a new index structure called the SR-tree (Sphere/Rectangle-tree) which integrates bounding spheres and bounding rectangles which enhances the performance on nearest neighbor queries especially for high-dimensional and non-uniform data which can be practical in actual image/video similarity indexing.
Content-Based Image Indexing
We formulate the content-based image indexing problem as a multi-dimensional nearest-neighbor search problem, and develop/implement an optimistic vantage-point tree algorithm that can dynamically
When Is ''Nearest Neighbor'' Meaningful?
The effect of dimensionality on the "nearest neighbor" problem is explored, and it is shown that under a broad set of conditions, as dimensionality increases, the Distance to the nearest data point approaches the distance to the farthest data point.