• Publications
  • Influence
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
  • J. Huang
  • Computer Science
  • Data Mining and Knowledge Discovery
  • 1 September 1998
The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containingExpand
  • 1,894
  • 174
  • PDF
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces. In high-dimensional data, clusters of objects often exist in subspaces rather than in theExpand
  • 485
  • 71
Automated variable weighting in k-means type clustering
This paper proposes a k-means type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k-means clustering process to iteratively update variableExpand
  • 594
  • 55
  • PDF
A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining
Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The k-means algorithm is best suited for implementing this operation because of its efficiencyExpand
  • 515
  • 43
  • PDF
FP-outlier: Frequent pattern based outlier detection
An outlier in a dataset is an observation or a point that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of such outliers is important for many applicationsExpand
  • 170
  • 37
  • PDF
A fuzzy k-modes algorithm for clustering categorical data
  • J. Huang, M. Ng
  • Mathematics, Computer Science
  • IEEE Trans. Fuzzy Syst.
  • 1 August 1999
This correspondence describes extensions to the fuzzy k-means algorithm for clustering categorical data. By using a simple matching dissimilarity measure for categorical objects and modes instead ofExpand
  • 438
  • 33
  • PDF
An optimization algorithm for clustering using weighted dissimilarity measures
One of the main problems in cluster analysis is the weighting of attributes so as to discover structures that may be present. By using weighted dissimilarity measures for objects, a new approach isExpand
  • 202
  • 28
Topic oriented community detection through social objects and link analysis in social networks
Community detection is an important issue in social network analysis. Most existing methods detect communities through analyzing the linkage of the network. The drawback is that each communityExpand
  • 138
  • 14
  • PDF
On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm
This correspondence describes extensions to the k-modes algorithm for clustering categorical data. By modifying a simple matching dissimilarity measure for categorical objects, a heuristic approachExpand
  • 165
  • 13
TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data
This paper proposes TW-k-means, an automated two-level variable weighting clustering algorithm for multiview data, which can simultaneously compute weights for views and individual variables. In thisExpand
  • 123
  • 11
  • PDF