Standardization and Its Effects on K-Means Clustering Algorithm

  title={Standardization and Its Effects on K-Means Clustering Algorithm},
  author={Ismail Mohamad and Dauda Usman},
  journal={Research Journal of Applied Sciences, Engineering and Technology},
  • I. Mohamad, D. Usman
  • Published 2013
  • Computer Science
  • Research Journal of Applied Sciences, Engineering and Technology
Data clustering is an important data exploration technique with many applications in data mining. [...] Key Result By comparing the results on infectious diseases datasets, it was found that the result obtained by the z-score standardization method is more effective and efficient than min-max and decimal scaling standardization methods.Expand
Review on Optimal Data Analysis Based on New Projection-Based K-Means Initialization Clustering Algorithm
The proposed formula initial use standard mathematician kernel density estimation technique to search out the extremely density information areas in one dimension and iteratively use density estimation from the lower variance dimensions to the upper variance ones till all the scale square measure computed. Expand
Data clustering is the technique of clustering the data into different groups and these formed groups are known as Clusters in Data mining [1]. Data elements are clustered into different groups basedExpand
Implementation of spectral clustering on microarray data of carcinoma using k-means algorithm
The major advantage of spectral clustering is in reducing data dimension, especially in this case to reduce the dimension of large microarray dataset. Expand
International Journal of Scientific Research in Computer Science, Engineering and Information Technology
Data Mining is the technique used to mine the data that is finding the useful information from the raw data. As day-by-day data is increasing it becomes difficult for us to analyzing such a hugeExpand
Application of the k-medoids Partitioning Algorithm for Clustering of Time Series Data
A comprehensive analysis of the applicability of a standard clustering algorithm, the k-medoids algorithm, for clustering of two diverse time series datasets, on dynamic power responses of a hybrid renewable energy source plant and neuroscience spike-train data. Expand
Comparative Performance Analysis of K-Means and DBSCAN Clustering algorithms on various platforms
In this study, K-means and DBSCAN clustering algorithm is performed on selected datasets in the four most popular platforms- Python, Matlab, R and Wolfram Mathematica to find that algorithm takes different execution time in different platform. Expand
Classification Performance Improvement Using Random Subset Feature Selection Algorithm for Data Mining
An attempt is made to improve the existing RSFS algorithm's performance for dimensionality reduction and increase its stability and the improved algorithm is superior in reducing the dimensionality and improving the classification accuracy when used with a simple kNN classifier. Expand
Application of Agglomerative Hierarchical Clustering for Clustering of Time Series Data
Investigation of the performance of the standard agglomerative hierarchical clustering algorithm using two time series datasets from electric power system and neuroscience area shows that the effectiveness of the clustering algorithms is affected to a large extent by the main characteristics of the clusters data and algorithm’s parameters. Expand
Avoiding common pitfalls when clustering biological data
This article reviews common pitfalls identified from the published molecular biology literature and presents methods to avoid them, and discusses ensemble clustering as an easy-to-implement method that enables the exploration of multiple clustering Solutions and improves robustness of clustering solutions. Expand
A modified self-updating clustering algorithm for application to dengue gene expression data
It was demonstrated that the proposed approach does not require the priori number of clusters and the convergence of the proposed algorithm was proved, and the algorithm was superior to other compared algorithms. Expand


Impact of Outlier Removal and Normalization Approach in Modified k-Means Clustering Algorithm
This paper analyzed the performance of modified k-Means clustering algorithm with data preprocessing technique includes cleaning method, normalization approach and outlier detection with automatic initialization of seed values on datasets from UCI dataset repository. Expand
Data clustering: a review
An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. Expand
A study of standardization of variables in cluster analysis
A methodological problem in applied clustering involves the decision of whether or not to standardize the input variables prior to the computation of a Euclidean distance dissimilarity measure.Expand
Data Mining: A Preprocessing Engine
This study emphasized on different types of normalization, each of which was tested against the ID3 methodology using the HSV data set, and recommendations were concluded on the best normalization method based on the factors and their priorities. Expand
Discovering Knowledge in Data: An Introduction to Data Mining
The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.Includes new chapters onExpand
Data Mining: A Knowledge Discovery Approach
This comprehensive textbook on data mining details the unique steps of the knowledge discovery process that prescribes the sequence in which data mining projects should be performed, from problem andExpand
Feature normalization and likelihood-based similarity measures for image retrieval
The effects of five feature normalization methods on retrieval performance are discussed and two likelihood ratio-based similarity measures that perform significantly better than the commonly used geometric approaches like the Lp metrics are described. Expand
Impact of normalization in distributed K-means clustering
  • Int. J. Soft Comput.,
  • 2009
Algorithms for Clustering Data