• Corpus ID: 212512355

EMPWC: Expectation Maximization with Particle Swarm Optimization based Weig- hted Clustering for Outlier Detection in Large Scale Data

  title={EMPWC: Expectation Maximization with Particle Swarm Optimization based Weig- hted Clustering for Outlier Detection in Large Scale Data},
  author={J. Rajeswari and Ramalingam Gunasundari},
Outlier detection is usually considered as a pre-processing step for locating in a data set, those objects that do not conform to well-defi ned notions of expected behaviour. It is very important in data mining for discovering novel or rare events, anomalies, vicious actions, exceptional phenomena etc. However, investigation of outlier detection for categorical data sets is especially a challenging task because of the diffi culty of defi ning a meaningful similarity measure. In addition, one… 
1 Citations

Figures and Tables from this paper

Tutorial on EM Algorithm
This tutorial aims to provide explanations of EM algorithm in order to help researchers comprehend it and some improvements of EM algorithms are proposed in the tutorial.


A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data
A novel random projection-based technique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data and introduces a theoretical analysis of the quality of approximation to guarantee the reliability of the estimation algorithm.
Semi-supervised outlier detection
This paper is concerned with employing supervision of limited amount of label information to detect outliers more accurately, with an objective function that punishes poor clustering results and deviation from known labels as well as restricts the number of outliers.
Detecting outliers using transduction and statistical testing
This paper presents a novel technique to detect outliers with respect to an existing clustering model based on Transductive Confidence Machines, which is capable of bootstrapping from a noisy data set a clean one that can be used to identify future outliers.
Statistical outlier detection using direct density ratio estimation
A new statistical approach to the problem of inlier-based outlier detection, i.e., finding outliers in the test set based on the training set consisting only of inliers, using the ratio of training and test data densities as an outlier score is proposed.
A fast outlier detection strategy for distributed high-dimensional data sets with mixed attributes
Experimental results show that the proposed outlier detection method compares very favorably with other state-of-the art outlier Detection strategies proposed in the literature and that the speedup achieved by its distributed version is very close to linear.
Outlier Detection in Stream Data by Clustering Method
The aim of this study is to present an algorithm to detect outlier in stream data by clustering method that concentrate to find realOutlier in period of time that is more than other methods.
A unifying framework for detecting outliers and change points from time series
This paper presents a unifying framework for dealing with outlier detection and change point detection, which is incrementally learned using an online discounting learning algorithm and compared with conventional methods to demonstrate its validity through simulation and experimental applications to incidents detection in network security.
An Effective Pattern Based Outlier Detection Approach for Mixed Attribute Data
This work uses logistic regression to learn patterns and then formulate the outlier factor in mixed attribute datasets, which shows that POD performs statistically significantly better than several classic outlier detection methods.
FP-outlier: Frequent pattern based outlier detection
A new method to detect outliers by discovering frequent patterns (or frequent itemsets) from the data set by defining a measure called FPOF (Frequent Pattern Outlier Factor) and proposing the FindFPOF algorithm to discover outliers.
Fast Lightweight Outlier Detection in Mixed-Attribute Data
The empirical results demonstrate that while the technique only shows marginal improvements in detection rates, its execution speed and memory usage are far better than those of current state-of-the-art outlier detection techniques.