On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study

@article{Campos2015OnTE,
  title={On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study},
  author={G. Campos and A. Zimek and J. Sander and Ricardo J. G. B. Campello and Barbora Micenkov{\'a} and Erich Schubert and I. Assent and M. E. Houle},
  journal={Data Mining and Knowledge Discovery},
  year={2015},
  volume={30},
  pages={891-927}
}
The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate benchmark datasets with ground truth annotation is a significant impediment to the evaluation of outlier methods. Even when labeled datasets are available, their suitability for the outlier detection… Expand
On the Evaluation of Outlier Detection: Measures, Datasets, and an Empirical Study Continued
The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of di erent standard outlierExpand
An Unsupervised Approach for Combining Scores of Outlier Detection Techniques, Based on Similarity Measures
TLDR
Two novel unsupervised approaches using ensembles of heterogeneous types of detectors are proposed and both approaches construct the ensemble using solely the results produced by each algorithm, identifying and giving more weight to the most suitable techniques depending on the particular dataset under examination. Expand
Are Outlier Detection Methods Resilient to Sampling?
TLDR
This paper performs an extensive experimental study on synthetic and real-world datasets, study seven diverse and representative outlier detection methods, compare results obtained from samples versus those obtained from the whole datasets and evaluates the accuracy of the resilience estimates. Expand
Fast and Scalable Outlier Detection with Metric Access Methods
TLDR
This paper proposes MetricABOD: a novel angle-based outlier detection algorithm that makes the analysis up to thousands of times faster, still being in average 26% more accurate than the most accurate related work. Expand
On normalization and algorithm selection for unsupervised outlier detection
TLDR
It is formally prove that normalization affects the nearest neighbor structure, and density of the dataset; hence, affecting which observations could be considered outliers, and an instance space analysis of combinations of normalization and detection methods enables the visualization of the strengths and weaknesses of these combinations. Expand
Contextual Outlier Interpretation
TLDR
A novel Contextual Outlier INterpretation (COIN) method to explain the abnormality of existing outliers spotted by detectors is proposed and experimental results demonstrate the flexibility and effectiveness of the proposed framework compared with existing interpretation approaches. Expand
A Comparison of Outlier Detection Techniques for High-Dimensional Data
TLDR
The recent advances on outlier detection for high-dimensional data are summarized, and an extensive experimental comparison to the popular detection methods on public datasets are made. Expand
Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data
TLDR
This work proposes and validate a new and practical process for the benchmarking of unsupervised outlier detection and describes three instantiations of this generic process that generate outliers with specific characteristics, like local outliers. Expand
Unsupervised Artificial Neural Networks for Outlier Detection in High-Dimensional Data
TLDR
This paper examines the usefulness of unsupervised artificial neural networks – autoencoders, self-organising maps and restricted Boltzmann machines – to detect outliers in high-dimensional data in a fully unsuper supervised way and shows that neural-based approaches outperform the current state of the art in terms of both runtime and accuracy. Expand
A Large-scale Study on Unsupervised Outlier Model Selection: Do Internal Strategies Suffice?
TLDR
This work studies the feasibility of employing internal model evaluation strategies for selecting a model for outlier detection, and finds that none would be practically useful, as they select models only comparable to a state-of-the-art detector (with random configuration). Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 81 REFERENCES
On the internal evaluation of unsupervised outlier detection
TLDR
An index called IREOS (Internal, Relative Evaluation of Outlier Solutions) is proposed that can evaluate and compare different candidate labelings of a collection of multivariate observations in terms of outliers and inliers and is proposed for the internal evaluation of top-n (binary) outlier detection results. Expand
On Evaluation of Outlier Rankings and Outlier Scores
TLDR
A generalized view of evaluation methods is presented that allows both to evaluate the performance of existing methods as well as to compare different methods w.r.t. their detection performance. Expand
Distance-based outlier detection
TLDR
A family of state of the art distance-based outlier detection algorithms are evaluated and a factorial design study highlights the important fact that no single optimization or combination of optimizations (factors) always dominates on all types of data. Expand
Subsampling for efficient and effective unsupervised outlier detection ensembles
TLDR
Subsampling is proposed and studied as a technique to induce diversity among individual outlier detectors and it is shown analytically and experimentally that an outlier detector based on a subsample per se can already improve upon the results of the sameoutlier detector on the complete dataset. Expand
Outlier detection by active learning
TLDR
This paper presents a novel approach to outlier detection based on classification, which is superior to other methods based on the same reduction to classification, but using standard classification methods, and shows that it is competitive to the state-of-the-art outlier Detection methods in the literature. Expand
Discriminative features for identifying and interpreting outliers
TLDR
An algorithm is proposed that uncovers outliers in subspaces of reduced dimensionality in which they are well discriminated from regular objects while at the same time retaining the natural local structure of the original data to ensure the quality of outlier explanation. Expand
Statistical selection of relevant subspace projections for outlier ranking
TLDR
This work proposes a novel outlier ranking based on the objects deviation in a statistically selected set of relevant subspace projections and provides a selection of subspaces with high contrast to tackle the general challenges of detecting outliers hidden in subspaced of the data. Expand
Angle-based outlier detection in high-dimensional data
TLDR
This paper proposes a novel approach named ABOD (Angle-Based Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points and shows ABOD to perform especially well on high-dimensional data. Expand
A survey on unsupervised outlier detection in high-dimensional numerical data
TLDR
This survey article discusses some important aspects of the ‘curse of dimensionality’ in detail and surveys specialized algorithms for outlier detection from both categories. Expand
Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces
TLDR
This paper proposes a unified framework for combining different outlier detection algorithms that is very effective in detecting outliers in the real-world context compared to other ensemble and individual approaches. Expand
...
1
2
3
4
5
...