# Distance-based outliers: algorithms and applications

@article{Knorr2000DistancebasedOA, title={Distance-based outliers: algorithms and applications}, author={Edwin M. Knorr and Raymond T. Ng and Vladimir Tucakov}, journal={The VLDB Journal}, year={2000}, volume={8}, pages={237-253} }

Abstract. This paper deals with finding outliers (exceptions) in large, multidimensional datasets. The identification of outliers can lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even the analysis of performance statistics of professional athletes. Existing methods that we have seen for finding outliers can only deal efficiently with two dimensions/attributes of a dataset. In this paper, we study the notion of DB (distance… Expand

#### Figures, Tables, and Topics from this paper

#### 1,128 Citations

Example-Based Outlier Detection for High Dimensional Datasets

- Computer Science
- 2005

A novel solution to the problem of detecting outliers based on user examples for high dimensional datasets by discovering the hidden view of outliers and picking out further objects that are outstanding in the projection where the examples stand out greatly is presented. Expand

Class Outliers Mining: Distance-Based Approach

- Computer Science
- 2007

This research poses the problem that is Class Outliers Mining and a method to find out those outliers and proposes the Class Outlier Factor (COF) which measures the degree of being a Class outlier for a data object. Expand

Outliers Detection in Multi-label Datasets

- Computer Science
- MICAI
- 2020

This paper proposes a method that measures the degree of anomaly of an object in a multi-label dataset and quantifies the level of irregularity of that object with respect to the dataset. Expand

Mining class outliers: concepts, algorithms and applications in CRM

- Computer Science
- Expert Syst. Appl.
- 2004

The notion of class outlier is developed and proposed practical solutions by extending existing outlier detection algorithms to this case are proposed and its potential applications in CRM (customer relationship management) are also discussed. Expand

Outlier detection by example

- Computer Science
- Journal of Intelligent Information Systems
- 2010

This OBE (Outlier By Example) system is the first that allows users to provide examples of outliers in low-dimensional datasets and can discover values that a user would consider outliers. Expand

A Scalable and Efficient Outlier Detection Strategy for Categorical Data

- Computer Science
- 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007)
- 2007

Attribute Value Frequency (A VF) is introduced, a fast and scalable outlier detection strategy for categorical data that scales linearly with the number of data points and attributes, and relies on a single data scan. Expand

A Scalable and Efficient Outlier Detection Strategy for Categorical Data

- Computer Science
- 2007

Attribute Value Frequency (A VF) is introduced, a fast and scalable outlier detection strategy for categorical data that scales linearly with the number of data points and attributes, and relies on a single data scan. Expand

Detection of outliers and outliers clustering on large datasets with distributed computing

- Computer Science
- 2012

This work presents several distributed computing algorithms to outlier detection, starting from a distributed version of an existent algorithm, CURIO, and introducing a series of optimizations and variants that leads to a new method, Curio3XD, that allows to resolve both the common issues typical of this problem, the constraints imposed by the size and the dimensionality of the datasets. Expand

A Clustering Approach for Outliers Detection in a Big Point-of-Sales Database

- Computer Science
- 2019 International Conference on Machine Learning and Data Engineering (iCMLDE)
- 2019

A clustering-based approach to identifying outliers in a retail point-of-sales dataset is proposed and the experimental results show that the K-means algorithm outperforms the (FCM) Fuzzy C-mean algorithm in terms of outlier detection efficiency, and it is an effective outlier Detection solution. Expand

Outlier mining in large high-dimensional data sets

- Mathematics, Computer Science
- IEEE Transactions on Knowledge and Data Engineering
- 2005

An in-memory and disk-based implementation of the HilOut algorithm and a thorough scaling analysis for real and synthetic data sets showing that the algorithm scales well in both cases are presented. Expand

#### References

SHOWING 1-10 OF 47 REFERENCES

Algorithms for Mining Distance-Based Outliers in Large Datasets

- Computer Science
- VLDB
- 1998

This paper provides formal and empirical evidence showing the usefulness of DB-outliers and presents two simple algorithms for computing such outliers, both having a complexity of O(k N’), k being the dimensionality and N being the number of objects in the dataset. Expand

A unified approach for mining outliers

- Computer Science
- CASCON
- 1997

The proposed, intuitive notion of outliers can unify or generalize many of the existing notions of outlier provided by discordancy tests for standard statistical distributions, so that when mining large datasets containing many attributes, a unified approach can replace many statistical discordancies tests, regardless of any knowledge about the underlying distribution of the attributes. Expand

A Unified Notion of Outliers: Properties and Computation

- Computer Science
- KDD
- 1997

A unified outlier detection system can replace a whole spectrum of statistical discordancy tests with a single module detecting only the kinds of outliers proposed. Expand

BIRCH: an efficient data clustering method for very large databases

- Computer Science
- SIGMOD '96
- 1996

A data clustering method named BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) is presented, and it is demonstrated that it is especially suitable for very large databases. Expand

A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise

- Computer Science
- KDD
- 1996

DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it. Expand

Fast Computation of 2-Dimensional Depth Contours

- Mathematics, Computer Science
- KDD
- 1998

A fast algorithm is given, FDC, which computes the first k 2-D depth contours by restricting the computation to a small selected subset of data points, instead of examining all data points. Expand

A Linear Method for Deviation Detection in Large Databases

- Computer Science
- KDD
- 1996

The problem of finding deviations in large data bases is described, a formal description of the problem is given and a linear algorithm for detecting deviations is presented, using the implicit redundancy of the data. Expand

Eecient and Eeective Clustering Methods for Spatial Data Mining

- 1994

Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we explore whether clustering methods have a role… Expand

Efficient and Effective Clustering Methods for Spatial Data Mining

- Computer Science
- VLDB
- 1994

The analysis and experiments show that with the assistance of CLAHANS, these two algorithms are very effective and can lead to discoveries that are difficult to find with current spatial data mining algorithms. Expand

Fast Spatio-Temporal Data Mining of Large Geophysical Datasets

- Computer Science
- KDD
- 1995

Early experiences are presented with a prototype exploratory data analysis environment, CONQUEST, designed to provide content-based access to such massive scientific datasets, and several associated feature extraction algorithms implemented on MPP platforms. Expand