Finding k-Closest-Pairs Efficiently for High Dimensional Data

Abstract

We present a novel approach to report approximate as well as exact k-closest pairs for sets of high dimensional points, under the L t-metric, t = 1; : : : ; 1. The proposed algorithms are eecient and simple to implement. They all use multiple shifted copies of the data points sorted according to their position along a space lling curve, such as the Peano curve, in a way that allows us to make performance guarantees and without assuming that the dimensionality d is constant. The rst algorithm computes an O(d 1+1=t) approximation to the k th closest pair distance in O(d 2 n log +dk(d + log k)) time. Experimental results, obtained using various real data sets of varying dimensions, indicate that the approximation factor is much better in practice. In the second algorithm we use this approximation in order to nd the exact k closest pairs in O(dM) additional time, where M is the number of points in certain short subsegments of the space-lling curve. The exact algorithm is particularly eecient and M = O(k) can be guaranteed, when presented with data sets that satisfy certain separation conditions. The proposed approach can be easily adapted to other proximity problems, including xed-radius neighbor search, minimal k-point clustering, and nearest neighbor search.

Extracted Key Phrases

Cite this paper

@inproceedings{Inaba2000FindingKE, title={Finding k-Closest-Pairs Efficiently for High Dimensional Data}, author={Mary Inaba and Hiroshi Imai}, booktitle={CCCG}, year={2000} }