BrePartition: Optimized High-Dimensional kNN Search With Bregman Distances

@article{Song2022BrePartitionOH,
  title={BrePartition: Optimized High-Dimensional kNN Search With Bregman Distances},
  author={Yang Song and Yu Gu and Rui Zhang and Ge Yu},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2022},
  volume={34},
  pages={1053-1065}
}
  • Yang Song, Yu Gu, Ge Yu
  • Published 30 May 2020
  • Computer Science
  • IEEE Transactions on Knowledge and Data Engineering
Bregman distances (also known as Bregman divergences) are widely used in machine learning, speech recognition and signal processing, and kNN searches with Bregman distances have become increasingly important with the rapid advances of multimedia applications. Data in multimedia applications such as images and videos are commonly transformed into space of hundreds of dimensions. Such high-dimensional space has posed significant challenges for existing kNN search algorithms with Bregman distances… 

Neural Bregman Divergences for Distance Learning

TLDR
This work proposes a new approach to learning arbitrary Bergman divergences in a differentiable manner via input convex neural networks and obtains the first method for learning neural Bregman Divergences, providing the foundation and tooling for better developing and studying asymmetric distance learning.

ProMIPS: Efficient High-Dimensional c-Approximate Maximum Inner Product Search with a Lightweight Index

TLDR
This paper project high-dimensional points to low-dimensional ones via 2-stable random projections and derive probability-guaranteed searching conditions, by which the c-AMIP results can be guaranteed in accuracy with arbitrary probabilities and proposes Quick-Probe for quickly determining the searching bound satisfying the derived condition in advance, avoiding the inefficient incremental searching process.

Reachable Distance Function for KNN Classification

TLDR
The reachable distance function is not a geometric direct-line distance between two data points, it gives a consideration to the class attribute of a training dataset when measuring the affinity between data points and includes their class center distance and real distance.

Z Distance Function for KNN Classification

TLDR
The Z distance function is not a geometric direct-line distance between two data points, but it gives a consideration to the class attribute of a training dataset when measuring the affinity between data points.

Neural Bregman Divergences for Distance Learning

TLDR
A new approach to learning arbitrary Bergman divergences in a differentiable manner via input convex neural networks is proposed, providing the foundation and tooling for better developing and studying asymmetric distance learning.

References

SHOWING 1-10 OF 52 REFERENCES

Coresets and approximate clustering for Bregman divergences

TLDR
The first coreset construction for this problem for a large subclass of Bregman divergences, including important dissimilarity measures such as the Kullback-Leibler divergence and the Itakura-Saito divergence is given.

Similarity Search on Bregman Divergence: Towards Non-Metric Indexing

TLDR
This paper devise a novel solution to handle this class of distance measures by expanding and mapping points in the original space to a new extended space, and shows how state-of-the-art tree-based indexing methods, and vector approximation file (VA-file) methods, can be adapted on this extended space to answer queries efficiently.

HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

TLDR
This paper proposes a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases and uses Ptolemaic inequality to produce better lower bounds.

Efficient Bregman Range Search

TLDR
An algorithm for efficient range search when the notion of dissimilarity is given by a Bregman divergence is developed based on a recently proposed space decomposition for B Regman divergences.

I-LSH: I/O Efficient c-Approximate Nearest Neighbor Search in High-Dimensional Space

TLDR
This paper introduces an incremental search based c-ANN search algorithm, named I-LSH, which adopts a more natural search strategy to incrementally access the hash values of the objects, and provides rigorous theoretical analysis to underpin this incremental search strategy.

Fast nearest neighbor retrieval for bregman divergences

TLDR
The data structure introduced in this work shares the same basic structure as the popular metric ball tree, but employs convexity properties of bregman divergences in place of the triangle inequality.

Nearest neighbor search on total bregman balls tree

TLDR
A new data structure and a new algorithm for NNS using Total Bregman Divergence (TBD), which is invariant in relation to rotation of the coordinate axes and allows for the definition of centers, which are robust to noisy data.

That was fast! Speeding up NN search of high dimensional distributions

TLDR
A novel and efficient algorithm for deciding whether to explore nodes during backtracking, based on a variational approximation, which reduces the number of computations per node, and overcomes the limitations of Bregman Ball Trees on high dimensional data.

Approximate bregman near neighbors in sublinear time: beyond the triangle inequality

TLDR
The first provably approximate nearest neighbor (ANN) algorithms for Bregman divergences over bounded domain are given and poly-log n bounds are obtained for a more abstract class of distance measures which satisfy certain structural properties.

Distance Encoded Product Quantization for Approximate K-Nearest Neighbor Search in High-Dimensional Space

TLDR
This paper proposes a novel compact code representation that encodes both the cluster index and quantized distance between a point and its cluster center in each subspace by distributing the bit-budget and extends the method to encode global residual distances in the original space.
...