Robust statistics for outlier detection

  title={Robust statistics for outlier detection},
  author={Peter J. Rousseeuw and Mia Hubert},
  journal={Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery},
  • P. RousseeuwM. Hubert
  • Published 2011
  • Computer Science, Mathematics
  • Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
When analyzing data, outlying observations cause problems because they may strongly influence the result. [] Key Method We discuss robust procedures for univariate, low-dimensional, and high-dimensional data such as estimation of location and scatter, linear regression, principal component analysis, and classification. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 73-79 DOI: 10.1002/widm.2 This article is categorized under: Algorithmic Development > Biological Data Mining…

Outliers and Robustness for Ordinal Data

This chapter tackles the topics of robustness and multivariate outlier detection for ordinal data. We initially review outlier detection methods in regression for continuous data and give an example

There and back again: Outlier detection between statistical reasoning and data mining algorithms

From a joint point of view of data mining and statistics the roots and the path of development of statistical outlier Detection and of database‐related data mining methods for outlier detection are detailed.

Outlier Detection in Logistic Regression: A Quest for Reliable Knowledge from Predictive Modeling and Classification

  • A. NurunnabiG. West
  • Computer Science
    2012 IEEE 12th International Conference on Data Mining Workshops
  • 2012
A group deletion approach based diagnostic measure for identifying multiple influential observations in logistic regression and a plotting technique that can classify data into outliers, high leverage points, as well as influential and regular observations are introduced.

Multivariate voronoi outlier detection for time series

The approach copes with outliers in a multivariate framework, via designing and extracting effective attributes or features from the data that can take parametric or nonparametric forms.

Evaluation of outlier detection method performance in symmetric multivariate distributions

Evaluating the blocked adaptive computationally efficient outlier nominators (BACON), the fast minimum covariance determinant (FAST-MCD), and the robust Mahalanobis distance (RM) method in multivariate data sets indicates that the performance of these methods varies according to the distribution type.

An Overview of Multiple Outliers in Multidimensional Data

An overview of multivariate outlier detection methods is provided because of its growing importance in a wide variety of practical situations and because the difficulty of detection increases with the number of outliers and the dimension of the data.

Data perturbation for outlier detection ensembles

Data perturbation is proposed as a new technique to induce diversity inindividual outlier detectors as well as a rank accumulation method for the combination of the individual outlier rankings in order to construct an outlier detection ensemble.

Outlier Detection Based on Low Density Models

Results show that SDO satisfactorily competes with the best ranked outlier detection alternatives and makes SDO highly flexible and adaptable to stand-alone frameworks that must detect outliers fast with accuracy rates equivalent to lazy learning algorithms.

Regression and Outliers

It is pointed out that robust methods can and should be used for outlier detection, because outliers often contain additional information and are thus important and thus important.



High-Breakdown Robust Multivariate Methods

When applying a statistical method in practice it often occurs that some observations deviate from the usual assumptions. However, many classical methods are sensitive to outliers. The goal of robust

Unmasking Multivariate Outliers and Leverage Points

This work proposes to compute distances based on very robust estimates of location and covariance, better suited to expose the outliers in a multivariate point cloud, to avoid the masking effect.

Robust Regression and Outlier Detection

This paper presents the results of a two-year study of the statistical treatment of outliers in the context of one-Dimensional Location and its applications to discrete-time reinforcement learning.

Robust Estimates of Location and Dispersion for High-Dimensional Datasets

An estimator of location and scatter based on a modified version of the Gnanadesikan–Kettenring robust covariance estimate is proposed, which is as good as or better than SD and FMCD at detecting outliers and other structures, with much shorter computing times.

A robust method for cluster analysis

Let there be given a contaminated list of n R d -valued observations coming from g different, normally distributed populations with a common covariance matrix. We compute the ML-estimator with

An adjusted boxplot for skewed distributions

Fast and robust discriminant analysis

Computing LTS Regression for Large Data Sets

For small data sets FAST-LTS typically finds the exact LTS, whereas for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude.

Robust linear discriminant analysis using S‐estimators

The authors consider a robust linear discriminant function based on high breakdown location and covariance matrix estimators. They derive influence functions for the estimators of the parameters of

ROBPCA: A New Approach to Robust Principal Component Analysis

The ROBPCA approach, which combines projection pursuit ideas with robust scatter matrix estimation, yields more accurate estimates at noncontaminated datasets and more robust estimates at contaminated data.