A Comparison of Outlier Detection Algorithms for Machine Learning
@inproceedings{Escalante2005ACO, title={A Comparison of Outlier Detection Algorithms for Machine Learning}, author={Hugo Jair Escalante}, year={2005} }
In this paper a comparison of outlier detection algorithms is presented, we present an overview on outlier detection methods and experimental results of six implemented methods. We applied these methods for the prediction of stellar populations parameters as well as on machine learning benchmark data, inserting artificial noise and outliers. We used kernel principal component analysis in order to reduce the dimensionality of the spectral data. Experiments on noisy and noiseless data were…
No Paper Link Available
59 Citations
An Outlier Detection Algorithm Based on Spectral Clustering
- Computer Science2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application
- 2008
The experimental results show that the outlier detection algorithm outperforms the K-means based algorithm with high precision and low false alarm rate as well as desirable coverage ratio.
Detection and visualisation of outliers using kernel principal components
- Computer Science2015 Fifth International Conference on Digital Information and Communication Technology and its Applications (DICTAP)
- 2015
A new method to identify outliers from a dataset is applied to use the K-means clustering algorithm on the smallest principal components provided by the kernel principal components analysis.
A Comparative Evaluation of Supervised and Unsupervised Methods for Detecting Outliers
- Computer Science2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT)
- 2018
Light is shed on the layout and performance analysis of supervised and unsupervised outlier detection methods in determining the aforementioned outliers and the data mining tools like Rapid Miner and R are used.
Comparative Study of Outlier Detection Algorithms for Machine Learning
- Computer ScienceICDLT '18
- 2018
A comparison between effects of multivariate outlier detection algorithms on machine learning problems is performed and a comparative review is performed to distinguish the advantages and disadvantages of each algorithm and their respective effects on accuracy of SVM classifiers.
A Modified Density Based Outlier Mining Algorithm for Large Dataset
- Computer Science2008 International Seminar on Future Information Technology and Management Engineering
- 2008
A modified density based detection algorithm which utilizes the data partitioning method and presents some speedup strategies such as the introduction of module information to avoid large number of unnecessary computations while finding outliers.
KNN Based Outlier Detection Algorithm in Large Dataset
- Computer Science2008 International Workshop on Education Technology and Training & 2008 International Workshop on Geoscience and Remote Sensing
- 2008
A KNN based outlier detection algorithm which is consisted of two phases, which partitions the dataset into several clusters and then in each cluster, it calculates the Kth nearest neighborhood for object to find outliers.
EBOD: An ensemble-based outlier detection algorithm for noisy datasets
- Computer ScienceKnowl. Based Syst.
- 2021
An Efficient Outlier Mining Algorithm for Large Dataset
- Computer Science2008 International Conference on Information Management, Innovation Management and Industrial Engineering
- 2008
An efficient outlier mining algorithm based on KNN is proposed and it can find outlier more accurately through defining a correlation matrix considering the importance and correlation between attributes.
A Comparison of Outlier Detection Algorithm for Wireless Sensor Network
- Computer Science
- 2014
Experiments show that the proposed classification approach that provides outlier detection and data classification simultaneously outperforms other techniques in both effectiveness & efficiency.
Outlier Detection: Applications And Techniques
- Computer Science
- 2012
This paper attempts to bring together various outlier detection techniques, in a structured and generic description, to attain a better understanding of the different directions of research on outlier analysis for ourselves as well as for beginners in this research field.
References
SHOWING 1-10 OF 41 REFERENCES
Algorithms for Mining Distance-Based Outliers in Large Datasets
- Computer ScienceVLDB
- 1998
This paper provides formal and empirical evidence showing the usefulness of DB-outliers and presents two simple algorithms for computing such outliers, both having a complexity of O(k N’), k being the dimensionality and N being the number of objects in the dataset.
A Unified Notion of Outliers: Properties and Computation
- Computer ScienceKDD
- 1997
A unified outlier detection system can replace a whole spectrum of statistical discordancy tests with a single module detecting only the kinds of outliers proposed.
Robust Decision Trees: Removing Outliers from Databases
- Computer ScienceKDD
- 1995
This paper examines C4.5, a decision tree algorithm that is already quite robust - few algorithms have been shown to consistently achieve higher accuracy, and extends the pruning method to fully remove the effect of outliers, and this results in improvement on many databases.
Efficient algorithms for mining outliers from large data sets
- Computer ScienceSIGMOD '00
- 2000
A novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor is proposed and the top n points in this ranking are declared to be outliers.
An introduction to kernel-based learning algorithms
- Computer ScienceIEEE Trans. Neural Networks
- 2001
This paper provides an introduction to support vector machines, kernel Fisher discriminant analysis, and kernel principal component analysis, as examples for successful kernel-based learning methods.…
Noise Clustering with a Fixed Fraction of Noise
- Computer Science
- 2004
The so-called noise clustering technique is modified making it more robust against a wrong choice of its main control parameter, the noise distance, including a computationally efficient algorithm.
Discovering Informative Patterns and Data Cleaning
- Computer ScienceAdvances in Knowledge Discovery and Data Mining
- 1994
A method for discovering informative patterns from data that can be reduced to only a few representative data entries and an attractive candidate for new applications in knowledge discovery is presented.
Probabilistic noise identification and data cleaning
- Computer ScienceThird IEEE International Conference on Data Mining
- 2003
This work presents LENS, an approach for identifying corrupted fields and using the remaining noncorrupted fields for subsequent modeling and analysis, and provides an algorithm for the unsupervised discovery of such models.
Identifying and Eliminating Mislabeled Training Instances
- Environmental ScienceAAAI/IAAI, Vol. 1
- 1996
Empirical results suggest that the ensemble filter approach is an effective method for identifying labeling errors, and further, that the approach will significantly benefit ongoing research to develop accurate and robust remote sensing-based methods to map land cover at global scales.
Nonlinear Component Analysis as a Kernel Eigenvalue Problem
- MathematicsNeural Computation
- 1998
A new method for performing a nonlinear form of principal component analysis by the use of integral operator kernel functions is proposed and experimental results on polynomial feature extraction for pattern recognition are presented.