Outlier mining in large high-dimensional data sets

@article{Angiulli2005OutlierMI,
title={Outlier mining in large high-dimensional data sets},
author={Fabrizio Angiulli and Clara Pizzuti},
journal={IEEE Transactions on Knowledge and Data Engineering},
year={2005},
volume={17},
pages={203-215}
}

A new definition of distance-based outlier and an algorithm, called HilOut, designed to efficiently detect the top n outliers of a large and high-dimensional data set are proposed. Given an integer k, the weight of a point is defined as the sum of the distances separating it from its k nearest-neighbors. Outlier are those points scoring the largest values of weight. The algorithm HilOut makes use of the notion of space-filling curve to linearize the data set, and it consists of two phases. Theâ€¦Â CONTINUE READING