Author pages are created from data sourced from our academic publisher partnerships and public sources.

- Publications
- Influence

Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

- J. Huang
- Computer Science
- Data Mining and Knowledge Discovery
- 1 September 1998

The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing… Expand

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

- Liping Jing, M. Ng, J. Huang
- Computer Science
- IEEE Transactions on Knowledge and Data…
- 1 August 2007

This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces. In high-dimensional data, clusters of objects often exist in subspaces rather than in the… Expand

Automated variable weighting in k-means type clustering

- J. Huang, M. Ng, H. Rong, Z. Li
- Medicine, Computer Science
- IEEE Transactions on Pattern Analysis and Machine…
- 1 May 2005

This paper proposes a k-means type clustering algorithm that can automatically calculate variable weights. A new step is introduced to the k-means clustering process to iteratively update variable… Expand

A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining

- J. Huang
- Computer Science
- DMKD
- 1997

Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The k-means algorithm is best suited for implementing this operation because of its efficiency… Expand

- 515
- 43
- PDF

FP-outlier: Frequent pattern based outlier detection

- Zengyou He, X. Xu, J. Huang, Shengchun Deng
- Computer Science
- Comput. Sci. Inf. Syst.
- 2005

An outlier in a dataset is an observation or a point that is considerably dissimilar to or inconsistent with the remainder of the data. Detection of such outliers is important for many applications… Expand

A fuzzy k-modes algorithm for clustering categorical data

This correspondence describes extensions to the fuzzy k-means algorithm for clustering categorical data. By using a simple matching dissimilarity measure for categorical objects and modes instead of… Expand

An optimization algorithm for clustering using weighted dissimilarity measures

One of the main problems in cluster analysis is the weighting of attributes so as to discover structures that may be present. By using weighted dissimilarity measures for objects, a new approach is… Expand

Topic oriented community detection through social objects and link analysis in social networks

- Zhongying Zhao, S. Feng, Q. Wang, J. Huang, G. Williams, Jianping Fan
- Computer Science
- Knowl. Based Syst.
- 1 February 2012

Community detection is an important issue in social network analysis. Most existing methods detect communities through analyzing the linkage of the network. The drawback is that each community… Expand

On the Impact of Dissimilarity Measure in k-Modes Clustering Algorithm

- M. Ng, M. J. Li, J. Huang, Zengyou He
- Medicine, Computer Science
- IEEE Transactions on Pattern Analysis and Machine…
- 1 March 2007

This correspondence describes extensions to the k-modes algorithm for clustering categorical data. By modifying a simple matching dissimilarity measure for categorical objects, a heuristic approach… Expand

TW-k-means: Automated two-level variable weighting clustering algorithm for multiview data

- Xiaojun Chen, X. Xu, J. Huang, Yunming Ye
- Computer Science
- IEEE Transactions on Knowledge and Data…
- 1 April 2013

This paper proposes TW-k-means, an automated two-level variable weighting clustering algorithm for multiview data, which can simultaneously compute weights for views and individual variables. In this… Expand