Local subspace-based outlier detection using global neighbourhoods

@article{Stein2016LocalSO,
  title={Local subspace-based outlier detection using global neighbourhoods},
  author={Bas van Stein and Matthijs van Leeuwen and Thomas B{\"a}ck},
  journal={2016 IEEE International Conference on Big Data (Big Data)},
  year={2016},
  pages={1136-1142}
}
Outlier detection in high-dimensional data is a challenging yet important task, as it has applications in, e.g., fraud detection and quality control. State-of-the-art density-based algorithms perform well because they 1) take the local neighbourhoods of data points into account and 2) consider feature subspaces. In highly complex and high-dimensional data, however, existing methods are likely to overlook important outliers because they do not explicitly take into account that the data is often… 

Figures and Tables from this paper

Outlier detection based on sparse coding and neighbor entropy in high-dimensional space
TLDR
Sparse coding and Neighbor entropy based Outlier Detection can detect local and global outliers and construct neighborhood in a self-manner and the comparison to the state-of-the-art methods validate the advantages of the algorithm.
DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles
TLDR
An unsupervised outlier detector combination framework called DCSO is proposed, demonstrated and assessed for the dynamic selection of most competent base detectors, with an emphasis on data locality.
Estimation of Locally Relevant Subspace in High-dimensional Data
TLDR
This paper presents a technique that identifies a locally relevant subspace and associated low-dimensional subspaces by deriving a final correlation score and demonstrates the effectiveness of the technique in determining the generalised locallyrelevant subspace.
LSCP: Locally Selective Combination in Parallel Outlier Ensembles
TLDR
A framework, called Locally Selective Combination in Parallel Outlier Ensembles (LSCP), is proposed which addresses the issue of reliable base detectors during model combination by defining a local region around a test instance using the consensus of its nearest neighbors in randomly selected feature subspaces.
Outlier Detection using AI: A Survey
TLDR
This survey aims to guide the reader to better understand recent progress of OD methods for the assurance of AI and discusses recent state-of-the-art approaches, their application areas, and performances.
Recent Progress of Anomaly Detection
TLDR
A comprehensive overview of the existing work on anomaly detection, especially for the data with high dimensionalities and mixed types, where identifying anomalous patterns or behaviours is a nontrivial work.
Innovative Multi-Step Anomaly Detection Algorithm with Real-World Implementation: Case Study in Supply Chain Management
TLDR
A novel multi-step anomaly detection algorithm based on the greatest common divisor and median value is described which showed significant results in anomaly detection on company orders and improved a number of processes in the operation of the smart warehouse management system.
Statistical Validity and Consistency of Big Data Analytics: A General Framework
TLDR
The partition-repetition approach proposed here is broad enough to encompass all practical data analytic problems and has the potential to push forward advancement of Big Data analytics in the right direction.
Application of machine learning to characterize gas hydrate reservoirs in Mackenzie Delta (Canada) and on the Alaska north slope (USA)
Artificial neural network-trained models were used to predict gas hydrate saturation distributions in permafrost-associated deposits in the Eileen Gas Hydrate Trend on the Alaska North Slope (ANS),
Neighborhood Structure Assisted Non-negative Matrix Factorization and its Application in Unsupervised Point Anomaly Detection
TLDR
This work proposes to consider and incorporate the neighborhood structural similarity information within the NMF framework by modeling the data through a minimum spanning tree, based on the understanding that in the presence of complicated data structure, aminimum spanning tree can approximate the intrinsic distance between two data points better than a simple Euclidean distance does.

References

SHOWING 1-10 OF 24 REFERENCES
LOF: identifying density-based local outliers
TLDR
This paper contends that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier, called the local outlier factor (LOF), and gives a detailed formal analysis showing that LOF enjoys many desirable properties.
HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
TLDR
A novel subspace search method that selects high contrast subspaces for density-based outlier ranking and proposes a first measure for the contrast of subspace dimensions to enhance the quality of traditional outlier rankings.
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection
TLDR
A formalized method of analysis is provided to allow for a theoretical comparison and generalization of many existing methods and improves understanding of the shared properties and of the differences of outlier detection models.
Outlier Ranking via Subspace Analysis in Multiple Views of the Data
TLDR
This work proposes Outrank, a novel outlier ranking concept that exploits subspace analysis to determine the degree of outlierness, and outperforms state-of-the-artoutlierness measures.
Outlier Detection in Arbitrarily Oriented Subspaces
In this paper, we propose a novel outlier detection model to find outliers that deviate from the generating mechanisms of normal instances by considering combinations of different subsets of
LOCI: fast outlier detection using the local correlation integral
TLDR
Experiments show that LOCI and aLOCI can automatically detect outliers and micro-clusters, without user-required cut-offs, and that they quickly spot both expected and unexpected outliers.
Incremental Local Outlier Detection for Data Streams
TLDR
The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points in the data set.
Distance-based outliers: algorithms and applications
TLDR
Outlier detection can be done efficiently for large datasets, and for k-dimensional datasets with large values of k, and it is shown that outlier detection is a meaningful and important knowledge discovery task.
Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data
TLDR
This work proposes an original outlier detection schema that detects outliers in varying subspaces of a high dimensional feature space and shows that it is superior to existing full-dimensional approaches and scales well to high dimensional databases.
LoOP: local outlier probabilities
TLDR
A local density based outlier detection method providing an outlier "score" in the range of [0, 1] that is directly interpretable as a probability of a data object for being an outliest.
...
1
2
3
...