• Corpus ID: 6278891

Some methods for classification and analysis of multivariate observations

  title={Some methods for classification and analysis of multivariate observations},
  author={J. MacQueen},
The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends… 
Supervised Nested Algorithm for Classification Based on K-Means
This paper presents an extension of the k-means algorithm based on the idea of recursive partitioning that can be used as a classification algorithm in the case of supervised classification and carries the integration of parametric model into trees one step further.
Implementation of the k-means Method for Single and Multi - Dimensions
Clustering is the process of grouping the data into classes or clusters, where the class label of each of the object is not known [1].In the case of dataset to be clustered consisting of n objects,
Experiments for the Number of Clusters in K-Means
An adjusted iK-Means method is proposed, which performs well in the current experiment setting and is compared to the least squares and least modules version of an intelligent version of the method by Mirkin.
Method of Classification through Normal Distribution Approximation Using Estimating the Adjacent and Multidimensional Scaling
This study proposes machine learning algorithms that approximates the density of the influence of the training data using a density function of normal distribution and proposes improved method that relocates theTraining data from the distance between the trainingData by multidimensional scaling as preprocessing.
Variable Selection in K-Means Clustering via Regularization
A new method of K-means clustering is proposed to detect irrelevant variables to the cluster structure and achieves the purpose of calculating variable weights using an entropy regularization method.
Improved Clustering with Augmented k-means
Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering,
A Comparison of K-Means and Mean Shift Algorithms
Clustering, or otherwise known as cluster analysis, is a learning problem that takes place without any human supervision. This technique has often been utilized, much efficiently, in data analysis,
Semi-supervised clustering methods
  • E. Bair
  • Computer Science, Mathematics
    Wiley interdisciplinary reviews. Computational statistics
  • 2013
This review describes several clustering algorithms (known as "semi-supervised clustering" methods) that can be applied in many situations, including document processing and modern genetics.
Model Based Penalized Clustering for Multivariate Data
This paper has developed a decision theoretic framework by which traditional K-means can be given a probabilistic footstep, which will not only enable us to do a soft clustering, rather the whole optimization problem could be recasted into Bayesian modeling framework, in which the knowledge of cluster number could be treated as an unknown parameter of interest, thus removing a severe constrain of K- means algorithm.
A Comparison of Latent Class, K-Means, and K-Median Methods for Clustering Dichotomous Data
Simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data found that the 3 approaches can exhibit profound differences when applied to real data.


On Grouping for Maximum Homogeneity
Abstract Given a set of arbitrary numbers, what is a practical procedure for grouping them so that the variance within groups is minimized? An answer to this question, including a description of an
Comparison of Experiments
1. Summary Bohnenblust, Shapley, and Sherman [2] have introduced a method of comparing two sampling procedures or experiments; essentially their concept is that one experiment a is more informative
Hierarchical Grouping to Optimize an Objective Function
Abstract A procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for
Note on Grouping
Abstract Suppose that it is required to condense observations of a variate into a small number of groups, the grouping intervals to be chosen to retain as much information as possible. One way of
Data analysis in the social sciences: what about the details?
  • G. Ball
  • Computer Science
    AFIPS '65 (Fall, part I)
  • 1965
This paper attempts to demonstrate that there exists a class of techniques more suitably oriented toward the capabilities of the digital computer than are conventional analytic statistical techniques, and maintains that these techniques are capable of considering details in social sciences data, that is, relating the individuals described in the data.
  • L. Dubins, L. J. Savage
  • Mathematics, Medicine
    Proceedings of the National Academy of Sciences of the United States of America
  • 1965
On convergence of k-means and partitions with minimum average variance
  • Ann. Math. Statist
  • 1965
Principles of Numerical Taxonomy
Decision Making Process in Pattern Recognition
  • Decision Making Process in Pattern Recognition
  • 1962