A Survey of Outlier Detection Methodologies

@article{Hodge2004ASO,
  title={A Survey of Outlier Detection Methodologies},
  author={Victoria J. Hodge and Jim Austin},
  journal={Artificial Intelligence Review},
  year={2004},
  volume={22},
  pages={85-126}
}
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as… Expand
A Survey of Outlier Detection Methods in Network Anomaly Identification
TLDR
A comprehensive survey of well-known distance-based, density-based and other techniques for outlier detection and compare them is presented and definitions of outliers are provided and their detection based on supervised and unsupervised learning in the context of network anomaly detection are discussed. Expand
A Comparative Study of Outlier Detection Algorithms
TLDR
This paper presents a comprehensive analysis of three outlier detection methods Extensible Markov Model (EMM), Local Outlier Factor (LOF) and LCS-Mine, where algorithm analysis shows the time complexity analysis and outlier Detection accuracy. Expand
Outlier Detection in Multiple Linear Regression
Outlier detection as a branch of data mining has many important applications, and deserves more attention from data mining community. Outliers are normally treated as noise that needs to be removedExpand
Outlier detection based on neighborhood proximity.
TLDR
A novel scheme for classifying and combining various outlier detectors in order to exploit their own advantages is presented, and it is pointed out that this method yields better detection accuracy than existing ones on high-dimensional datasets. Expand
Comparative Study of Outlier Detection Approaches
TLDR
This paper presents a study of the various algorithms used recently in the literature for outlier detection, classified as supervised, unsupervised and semi-supervised. Expand
Different Outlier Detection Algorithms in Data Mining: A Review
Outlier is defined as an observation that deviates too much from other observations. The identification of outliers can lead to the discovery of useful and meaningful knowledge. Outlier detection hasExpand
Robust and Unsupervised Anomaly Detection for Multivariate Dataset
Anomaly detection (also outlier detection [1]) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data.[1]Expand
A simple sequential outlier detection with several residuals
  • J. Yoon
  • Computer Science
  • 2015 23rd European Signal Processing Conference (EUSIPCO)
  • 2015
TLDR
This paper focuses on the sequential (on-line) outlier detection schemes, that are based on the `delete-replace' approach, and demonstrates that three different types of residuals can be used to design the outlier Detection scheme to achieve accurate sequential estimation: marginal residual, conditional residual, and contribution. Expand
A STUDY ON DIFFERENT APPROACHES OF OUTLIER DETECTION IN DATA MINING
Data mining is a process of extracting knowledge from large databases. Knowledge is appreciated as ultimate power now a days and considered as very important factor for the success of anyExpand
Outlier Detection: Applications And Techniques
Outliers once upon a time regarded as noisy data in statistics, has turned out to be an important problem which is being researched in diverse fields of research and application domains. Many outlierExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 85 REFERENCES
Outlier detection for high dimensional data
TLDR
New techniques for outlier detection which find the outliers by studying the behavior of projections from the data set are discussed. Expand
Novelty detection using extreme value statistics
Extreme value theory is a branch of statistics that concerns the distribution of data of unusually low or high value, i.e. in the tails of some distribution. These extremal points are important inExpand
Robust Decision Trees: Removing Outliers from Databases
TLDR
This paper examines C4.5, a decision tree algorithm that is already quite robust - few algorithms have been shown to consistently achieve higher accuracy, and extends the pruning method to fully remove the effect of outliers, and this results in improvement on many databases. Expand
Algorithms for Mining Distance-Based Outliers in Large Datasets
TLDR
This paper provides formal and empirical evidence showing the usefulness of DB-outliers and presents two simple algorithms for computing such outliers, both having a complexity of O(k N’), k being the dimensionality and N being the number of objects in the dataset. Expand
Unsupervised Profiling Methods for Fraud Detection
Credit card fraud falls broadly into two categories: behavioural fraud and application fraud. Application fraud occurs when individuals obtain new credit cards from issuing companies using falseExpand
Detecting graph-based spatial outliers: algorithms and applications (a summary of results)
TLDR
This paper defines statistical tests, analyzes the statistical foundation underlying the approach, design several fast algorithms to detect spatial outliers, and provides a cost model for outlier detection procedures. Expand
A Linear Method for Deviation Detection in Large Databases
TLDR
The problem of finding deviations in large data bases is described, a formal description of the problem is given and a linear algorithm for detecting deviations is presented, using the implicit redundancy of the data. Expand
Informal identification of outliers in medical data
TLDR
The removal of outliers increased the descriptive classification accuracy of discriminant analysis functions and nearest neighbour method, while the predictive ability of these methods reduced somewhat. Expand
Efficient algorithms for mining outliers from large data sets
TLDR
A novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor is proposed and the top n points in this ranking are declared to be outliers. Expand
Procedures for Detecting Outlying Observations in Samples
Procedures are given for determining statistically whether the highest observation, the lowest observation, the highest and lowest observations, the two highest observations, the two lowestExpand
...
1
2
3
4
5
...