• Corpus ID: 6278891

Some methods for classification and analysis of multivariate observations

  title={Some methods for classification and analysis of multivariate observations},
  author={J. MacQueen},
The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends… 

Supervised Nested Algorithm for Classification Based on K-Means

This paper presents an extension of the k-means algorithm based on the idea of recursive partitioning that can be used as a classification algorithm in the case of supervised classification and carries the integration of parametric model into trees one step further.

Implementation of the k-means Method for Single and Multi - Dimensions

The k-means method uses the Euclidean distance measure, which appears to work well with compact clusters, and is scalable and efficient, and guaranteed to find a local minimum, and has ample interesting applications.

Experiments for the Number of Clusters in K-Means

An adjusted iK-Means method is proposed, which performs well in the current experiment setting and is compared to the least squares and least modules version of an intelligent version of the method by Mirkin.

Method of Classification through Normal Distribution Approximation Using Estimating the Adjacent and Multidimensional Scaling

This study proposes machine learning algorithms that approximates the density of the influence of the training data using a density function of normal distribution and proposes improved method that relocates theTraining data from the distance between the trainingData by multidimensional scaling as preprocessing.

Variable Selection in K-Means Clustering via Regularization

A new method of K-means clustering is proposed to detect irrelevant variables to the cluster structure and achieves the purpose of calculating variable weights using an entropy regularization method.

Improved Clustering with Augmented k-means

Augmented k-means frequently outperforms k-Means by more accurately classifying observations into known clusters and / or converging in fewer iterations, which can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, non-sphericity, substantial overlap, and high scatter.

A Comparison of K-Means and Mean Shift Algorithms

This paper is intended to compare and study two different clustering algorithms, k-mean and mean shift, and determines and presents the intrinsic grouping of objects present in the data, based on their attributes, in a batch of unlabeled raw data.

Model Based Penalized Clustering for Multivariate Data

This paper has developed a decision theoretic framework by which traditional K-means can be given a probabilistic footstep, which will not only enable us to do a soft clustering, rather the whole optimization problem could be recasted into Bayesian modeling framework, in which the knowledge of cluster number could be treated as an unknown parameter of interest, thus removing a severe constrain of K- means algorithm.

DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

A K-Means initialization similar to K- means++ is proposed, which would be able to estimate K based on the feature space while finding suitable initial centroids for K- Means in a deterministic manner, and shows improvement in both the estimation and final clustering performance.

A Comparison of Latent Class, K-Means, and K-Median Methods for Clustering Dichotomous Data

Simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data found that the 3 approaches can exhibit profound differences when applied to real data.



Comparison of Experiments

1. Summary Bohnenblust, Shapley, and Sherman [2] have introduced a method of comparing two sampling procedures or experiments; essentially their concept is that one experiment a is more informative

Hierarchical Grouping to Optimize an Objective Function

Abstract A procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for

Note on Grouping

Abstract Suppose that it is required to condense observations of a variate into a small number of groups, the grouping intervals to be chosen to retain as much information as possible. One way of


  • L. DubinsL. J. Savage
  • Mathematics
    Proceedings of the National Academy of Sciences of the United States of America
  • 1965
(X1 + .. . + Xn) 2> + (Aul + ...+ Ang) + a!(V1 + ...+ Vn) (1) is less than 1/(1 + aO3). This bound is sharp. Two lemmas, neither of which are difficult to verify, are used in the proof. The first of

Data analysis in the social sciences: what about the details?

  • G. Ball
  • Sociology
    AFIPS '65 (Fall, part I)
  • 1965
This paper attempts to demonstrate that there exists a class of techniques more suitably oriented toward the capabilities of the digital computer than are conventional analytic statistical techniques, and maintains that these techniques are capable of considering details in social sciences data, that is, relating the individuals described in the data.

Remarks on the Economics of Information

Stochastic processes

On Grouping for Maximum Homogeneity

Abstract Given a set of arbitrary numbers, what is a practical procedure for grouping them so that the variance within groups is minimized? An answer to this question, including a description of an

On uniform convergence of families of sequences of random variables