Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering

  title={Outlier Detection and Removal Algorithm in K-Means and Hierarchical Clustering},
  author={Anwesha Barai and Lopamudra Dey},
An outlier in a pattern is dissimilar with rest of the pattern in a dataset. Outlier detection is an important issue in data mining. It has been used to detect and remove anomalous objects from data. Outliers occur due to mechanical faults, changes in system behavior, fraudulent behavior, and human errors. This paper describes the methodology or detecting and removing outlier in K-Means and Hierarchical clustering. First apply clustering algorithm K-means and Hierarchical clustering on a data… 
An Improved Overlapping Clustering Algorithm to Detect Outlier
The outlier detection is incorporated in MCOKE algorithm so that it can detect and remove outliers that can participate in the calculation of assigning objects to one or more clusters.
A LOF K-Means Clustering on Hotspot Data
K-Means is the most popular of clustering method, but its drawback is sensitivity to outliers. This paper discusses the addition of the outlier removal method to the K-Means method to improve the
Evaluation Of Outlier Detection For Trajectory Data
Outlier of trajectory dataset is different from other in this trajectory dataset. The outlier is involved according to human error, sensors or mechanical faults and system behavior or environment. It
FilterK: A new outlier detection method for k-means clustering of physical activity
The main focus of the new outlier detection method is to improve the cluster purities of physical activity accelerometer data, but it is suggested it may be potentially applied to other types of dataset captured by k-means clustering.
eHMCOKE: an enhanced overlapping clustering algorithm for data analysis
Received Apr 17, 2020 Revised May 20, 2021 Accepted Jun 15, 2021 Improved multi-cluster overlapping k-means extension (IMCOKE) uses median absolute deviation (MAD) in detecting outliers in datasets
Analyzing rare event, anomaly, novelty and outlier detection terms under the supervised classification framework
A one-to-one assignment of terms to learning scenarios is proposed, so that each learning scenario is associated with the term most frequently used in the literature.
Toward semantic data imputation for a dengue dataset
An improvement in the efficiency of predicting missing data utilizing Particle Swarm Optimization (PSO), which is applied to the numerical data cleansing problem, with the performance of PSO being enhanced using K-means to help determine the fitness value.
Objective-Based Hierarchical Clustering of Deep Embedding Vectors
A new practical hierarchical clustering algorithm B++&C is proposed which gives a 5%/20% improvement on average for the popular Moseley-Wang (MW) / Cohen-Addad et al. (CKMM) objectives (normalized) compared to a wide range of classic methods and recent heuristics.
The Identification of Diabetes Mellitus Subtypes Applying Cluster Analysis Techniques: A Systematic Review
Cluster analysis enabled finding non-classic heterogeneity in diabetes, but there is still a necessity to explore and validate the capabilities of cluster analysis in more diverse and wider populations.
A new method for fault detection of aero-engine based on isolation forest
It is proved that the proposed dynamic threshold method for aero-engine fault detection based on Isolation Forest can not only achieve high detection accuracy but also has a short running time.


Outlier Detection over Data Set Using Cluster-Based and Distance-Based Approach
Proposed method for outlier detection takes less computational cost and performs better than the distance based method, and efficiently prunes of the safe cells (inliers) and save huge number of extra calculations.
A Study of Clustering Based Algorithm for Outlier Detection in Data streams
Recently many researchers have focused on mining data streams and they proposed many techniquesand algorithms for data streams. It refers to the process of extracting knowledge from nonstop fast
Improving K-Means by Outlier Removal
An Outlier Removal Clustering (ORC) algorithm that provides outlier detection and data clustering simultaneously and has a lower error on datasets with overlapping clusters than the competing methods is presented.
Cluster Based Outlier Detection Algorithm for Healthcare Data
Results show that the cluster-based outlier detection algorithm providing better accuracy than distance based outlier Detection algorithm for detecting and removing outlier score.
A New Procedure of Clustering Based on Multivariate Outlier Detection
Clustering is an extremely important task in a wide variety of application domains especially in management and social science research. In this paper, an iterative procedure of clustering method
Canonical PSO Based K-Means Clustering Approach for Real Datasets
Canonical PSO based K-means clustering algorithm is proposed and some important clustering indices (intercluster, intracluster) are analyzed and the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets are evaluated.
A Review of K-mean Algorithm
Three dissimilar modified k- mean algorithm are discussed which remove the limitation of k-mean algorithm and improve the speed and efficiency of k -mean algorithm.
A Systematic Review of Outliers Detection Techniques in Medical Data - Preliminary Study
Outlier detection techniques can be used to detect abnormal patterns in health records contributing to better data and better knowledge in the process of decision.
Combining and comparing clustering and layout algorithms
Many clustering and layout techniques have been used for structuring and visualising complex data. This paper explores a number of combinations and variants of sampling, K-means clustering and spring
Outlier Detection using Clustering Methods: a data cleaning application
The present invention provides a heat exchange element comprising a molded product of a paper-like material made of ceramic fibers as a matrix, the interstices among the ceramic fibers being