iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

@article{Jagadish2005iDistanceAA,
  title={iDistance: An adaptive B+-tree based indexing method for nearest neighbor search},
  author={H. V. Jagadish and Beng Chin Ooi and Kian-Lee Tan and Cui Yu and Rui Zhang},
  journal={ACM Trans. Database Syst.},
  year={2005},
  volume={30},
  pages={364-397}
}
In this article, we present an efficient B+-tree based indexing method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional metric space. [] Key Method This allows the points to be indexed using a B+-tree structure and KNN search to be performed using one-dimensional range search. The choice of partition and reference points adapts the index structure to the data distribution.We conducted extensive experiments to evaluate the iDistance technique, and report results demonstrating its…
Efficient nearest neighbor query based on extended B+-tree in high-dimensional space
Composite Distance Transformation for Indexing and k-Nearest-Neighbor Searching in High-Dimensional Spaces
TLDR
A novel composite distance transformation method, which is called CDT, is proposed to support a fast k-nearest-neighbor (k-NN) search in high-dimensional spaces and outperforms the state-of-the-art high- dimensional search techniques, such as the X-Tree, VA-file, iDistance and NB-Tree.
Cluster Splitting Based High Dimensional Metric Space Index B + -Tree
TLDR
New high dimensional index approach—cluster splitting based high dimensional B + -tree based on formulas to compute the "optimal" parameters of the cluster which can minimize the query cost in theory is presented.
An encoding-based dual distance tree high-dimensional index
TLDR
The results demonstrate that this method outperforms the state-of-the-art high-dimensional search techniques such as the X-tree, VA-file, iDistance and NB-Tree, especially when the query radius is not very large.
A Comprehensive Study of iDistance Partitioning Strategies for kNN Queries and High-Dimensional Data Indexing
TLDR
This work performs the first comprehensive analysis of different partitioning strategies for the state-of-the-art high-dimensional indexing technique iDistance, and establishes an up-to-date iDistance benchmark for efficient kNN querying of large-scale and high- dimensional data.
Cluster Analysis for Optimal Indexing
TLDR
This work introduces clustering for the sake of indexing to develop new clustering methods designed to optimize the data partitioning for an indexing-specific tree structure instead of finding data distribution-based clusters.
S2R-tree: a pivot-based indexing structure for semantic-aware spatial keyword search
TLDR
This paper proposes a novel pivot-based hierarchical indexing structure S2R-tree to integrate spatial and semantic information in a seamless way, and carefully design a space mechanism to transform the high dimensional semantic vectors to a low dimensional space so that more effective pruning effect can be achieved.
Effectiveness of NAQ-tree as index structure for similarity search in high-dimensional metric space
TLDR
This paper studies the effectiveness of a new index structure, called Nested-Approximate-eQuivalence-class tree (NAQ-tree), which overcomes the above disadvantages and is constructed by recursively dividing the data set into nested approximate equivalence classes.
Margin-Based Pivot Selection for Similarity Search Indexes
TLDR
Maximal Metric Margin Partitioning (MMMP), a partitioning scheme for similarity search indexes, and an indexing scheme, named the MMMP-Index, which uses MMMP and pivot filtering, which reduces the query execution cost.
Persistent Semi-Dynamic Ordered Partition Index
TLDR
This work proposes an alternative solution to indexing high-dimensional data, which takes advantage of increasing main memory sizes and the 40% annual improvement in disk transfer rates, and makes the Ordered-Partition---OP-tree, which is a main memory resident index, persistent by writing it onto disk.
...
...

References

SHOWING 1-10 OF 54 REFERENCES
Indexing the Distance: An Efficient Method to KNN Processing
TLDR
An eAEcient method, called iDistance, for K-nearest neighbor (KNN) search in a high-dimensional space, which partitions the data and selects a reference point for each partition, and describes how appropriate choices here can adapt the index structure to the data distribution.
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
TLDR
The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.
Fast nearest neighbor search in high-dimensional space
TLDR
This work precomputes the result of any nearest neighbor search which corresponds to a computation of the voronoi cell of each data point, which is based on a precomputation of the solution space and demonstrates the high efficiency for uniformly distributed as well as real data.
The SR-tree: an index structure for high-dimensional nearest neighbor queries
TLDR
This paper proposes a new index structure called the SR-tree (Sphere/Rectangle-tree) which integrates bounding spheres and bounding rectangles which enhances the performance on nearest neighbor queries especially for high-dimensional and non-uniform data which can be practical in actual image/video similarity indexing.
Distance-based indexing for high-dimensional metric spaces
TLDR
This paper introduces a distance based index structure called multi-vantage point (mvp) tree for similarity queries on high-dimensional metric spaces and shows that mvp-tree outperforms the vp-tree 20% to 80% for varying query ranges and different distance distributions.
The hybrid tree: an index structure for high dimensional feature spaces
  • K. Chakrabarti, S. Mehrotra
  • Computer Science
    Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)
  • 1999
TLDR
The hybrid tree-a multidimensional data structure for indexing high-dimensional feature spaces is introduced and significantly outperforms both purely DP-based and SP-based index mechanisms as well as linear scans at all dimensionalities for large-sized databases.
The pyramid-technique: towards breaking the curse of dimensionality
TLDR
The results of experiments demonstrate that the Pyramid-Technique outperforms the X-tree and the Hilbert R-tree by a factor of up to 14 (number of page accesses) and up to 2500 (total elapsed time) for range queries.
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets
TLDR
A fast algorithm to map objects into points in some k-dimensional space (k is user-defined), such that the dis-similarities are preserved, and this method is introduced from pattern recognition, namely, Multi-Dimensional Scaling (MDS).
A cost model for query processing in high dimensional data spaces
TLDR
A cost model for index structures for point databases such as the R*-tree and the X-tree is developed that provides accurate estimates of the number of data page accesses for range queries and nearest-neighbor queries under a Euclidean metric and a maximum metric.
...
...