Evaluating a Class of Distance-Mapping Algorithms for Data Mining and Clustering

Abstract

A distance-mapping algorithm takes a set of objects and a distance metric and then maps those objects to a Euclidean or pseudoEuclidean space in such a way that the distances among objects are approximately preserved. Distance mapping algorithms are a useful tool for clustering and visualization in data intensive applications, because they replace expensive distance calculations by sum-of-square calculations. This can make clustering in large databases with expensive distance metrics practical. In this paper we present five distance-mapping algorithms and conduct experiments to compare their performance in data clustering applications. These include two algorithms called FastMap and MetricMap, and three hybrid heuristics that combine the two algorithms in different ways. Experimental results on both synthetic and RNA data show the superiority of the hybrid algorithms. The results imply that FastMap and MetricMap capture complementary information about distance metrics and therefore can be used together to great benefit. The net effect is that multi-day computations may be done in minutes.

DOI: 10.1145/312129.312264

Extracted Key Phrases

Statistics

0102030'01'03'05'07'09'11'13'15'17
Citations per Year

142 Citations

Semantic Scholar estimates that this publication has 142 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Wang1999EvaluatingAC, title={Evaluating a Class of Distance-Mapping Algorithms for Data Mining and Clustering}, author={Jason Tsong-Li Wang and Xiong Wang and King-Ip Lin and Dennis Shasha and Bruce A. Shapiro and Kaizhong Zhang}, booktitle={KDD}, year={1999} }