Statistical approach to numerical databases: clustering using normalised Minkowski metrics

Abstract

Pre-processing or normalisation of data sets is widely used in a number of fields of machine intelligence. Contrary to the overwhelming majority of other normalisation procedures, when data is scaled to a unit range, it is argued in the paper that after normalisation of a data set, the average contributions of all features to the measure employed to assess the similarity of the data have to be equal to one another. Using the Minkowski distance as an example of a similarity metric, new normalised metrics are introduced such that the means of all attributes are the same and, hence, contributions of the features to similarity measures are approximately equalised. Such a normalisation is achieved by scaling of the numerical attributes, i.e. by dividing the database values by the means of the appropriate components of the metric.

Cite this paper

@inproceedings{Pham2006StatisticalAT, title={Statistical approach to numerical databases: clustering using normalised Minkowski metrics}, author={Duc Thang Pham and Yuriy I. Prostov and Maria M. Suarez-Alvarez}, year={2006} }