Simple Measures of Individual Cluster-Membership Certainty for Hard Partitional Clustering

  title={Simple Measures of Individual Cluster-Membership Certainty for Hard Partitional Clustering},
  author={Dongmeng Liu and Jinko Graham},
  journal={The American Statistician},
  pages={70 - 79}
ABSTRACT We propose two probability-like measures of individual cluster-membership certainty that can be applied to a hard partition of the sample such as that obtained from the partitioning around medoids (PAM) algorithm, hierarchical clustering or k-means clustering. One measure extends the individual silhouette widths and the other is obtained directly from the pairwise dissimilarities in the sample. Unlike the classic silhouette, however, the measures behave like probabilities and can be… 
Work Orders - Value from Structureless Text in the Era of Digitisation
The outcomes of this work showcase the potential of machine learning to drive the digitization of not only new installations, but also older assets, where as a result the large amount of unstructured historical data becomes an advantage rather than a hindrance.
Spatio-temporal and cross-scale interactions in hydroclimate variability: a case-study in France
Abstract. Understanding how water resources vary at different temporal and spatial scales in response to climate is crucial to inform long-term management. Climate change impacts and induced trends
Spatiotemporal and cross-scale interactions in hydroclimate variability: a case-study in France
Abstract. Understanding how water resources vary in response to climate at different temporal and spatial scales is crucial to inform long-term management. Climate change impacts and induced trends
Implementation of Pam Cluster for Evaluating SaaS on the Cloud Computing Environment
A quality model is implemented by using Data MiningPartitioning Around Medoids (Pam) clustering model for evaluating the quality of software as a service (SAAS) in the cloud computing environment.
Non-linear and non-stationary hydroclimate variability in France and the Euro-Atlantic area
The works presented in this thesis explore the non-linearity and non-stationarity aspects of the hydroclimate system in France and the Euro-Atlantique area. In part I, «Spatio-temporal scales of
Silhouettes and quasi residual plots for neural nets and tree-based classifiers
The PAC is used to construct a silhouette plot which is similar in spirit to the silhouette plot for cluster analysis (Rousseeuw, 1987), and the average silhouette width can be used to compare different classifications of the same dataset.


Model-Based Clustering, Discriminant Analysis, and Density Estimation
This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.
Model-based Gaussian and non-Gaussian clustering
The classification maximum likelihood approach is sufficiently general to encompass many current clustering algorithms, including those based on the sum of squares criterion and on the criterion of Friedman and Rubin (1967), but it is restricted to Gaussian distributions and it does not allow for noise.
mclust Version 4 for R : Normal Mixture Modeling for Model-Based Clustering , Classification , and Density Estimation
This version of mclust provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models.
Bayesian profile regression with an application to the National Survey of Children's Health.
This work proposes a method that addresses problems for categorical covariates by using, as its basic unit of inference, a profile formed from a sequence of covariate values, which is clustered into groups and associated via a regression model to a relevant outcome.
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes.
PReMiuM is a recently developed R package for Bayesian clustering using a Dirichlet process mixture model. This model is an alternative to regression models, non-parametrically linking a response
FactoMineR: An R Package for Multivariate Analysis
FactoMineR an R package dedicated to multivariate data analysis with the possibility to take into account different types of variables (quantitative or categorical), different kinds of structure on the data, and finally supplementary information (supplementary individuals and variables).
Similarity, Dissimilarity, and Distance, Measures of
Measures for evaluating similarity between a pair of units are discussed, showing that some coefficients are monotonic functions of each other and that many coefficients are particular instances of an overall measure of similarity.
Geometric Data Analysis: From Correspondence Analysis to Structured Data Analysis
This book discusses Geometric Data Analysis, a branch of Mathematical Bases, and its applications in Multivariate Statistics and Inductive Analysis, and some of the techniques used in this area have been described in detail in this book.
Similarity, Dissimilarity, and Distance Measure
Many coefficients exist that give measures of resemblance between a pair of cases, or samples, as opposed to association between pairs of variables; others measure resemblance between a pair of