• Corpus ID: 6278891

Some methods for classification and analysis of multivariate observations

@inproceedings{MacQueen1967SomeMF,
  title={Some methods for classification and analysis of multivariate observations},
  author={J. MacQueen},
  year={1967}
}
The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends… 

Supervised Nested Algorithm for Classification Based on K-Means

This paper presents an extension of the k-means algorithm based on the idea of recursive partitioning that can be used as a classification algorithm in the case of supervised classification and carries the integration of parametric model into trees one step further.

Method of Classification through Normal Distribution Approximation Using Estimating the Adjacent and Multidimensional Scaling

This study proposes machine learning algorithms that approximates the density of the influence of the training data using a density function of normal distribution and proposes improved method that relocates theTraining data from the distance between the trainingData by multidimensional scaling as preprocessing.

Variable Selection in K-Means Clustering via Regularization

A new method of K-means clustering is proposed to detect irrelevant variables to the cluster structure and achieves the purpose of calculating variable weights using an entropy regularization method.

Improved Clustering with Augmented k-means

Augmented k-means frequently outperforms k-Means by more accurately classifying observations into known clusters and / or converging in fewer iterations, which can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, non-sphericity, substantial overlap, and high scatter.

A Comparison of K-Means and Mean Shift Algorithms

This paper is intended to compare and study two different clustering algorithms, k-mean and mean shift, and determines and presents the intrinsic grouping of objects present in the data, based on their attributes, in a batch of unlabeled raw data.

Model Based Penalized Clustering for Multivariate Data

This paper has developed a decision theoretic framework by which traditional K-means can be given a probabilistic footstep, which will not only enable us to do a soft clustering, rather the whole optimization problem could be recasted into Bayesian modeling framework, in which the knowledge of cluster number could be treated as an unknown parameter of interest, thus removing a severe constrain of K- means algorithm.

DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

A K-Means initialization similar to K- means++ is proposed, which would be able to estimate K based on the feature space while finding suitable initial centroids for K- Means in a deterministic manner, and shows improvement in both the estimation and final clustering performance.

A Comparison of Latent Class, K-Means, and K-Median Methods for Clustering Dichotomous Data

Simulation-based comparisons of the latent class, K-means, and K-median approaches for partitioning dichotomous data found that the 3 approaches can exhibit profound differences when applied to real data.

Automation of Data Clusters based on Layered HMM

A novel method is proposed based on Layered Hidden Markov Model (LHMM) to identify a suitable number of clusters in a given unlabeled dataset without using prior knowledge about the number of clustering, and the experimental results indicate the efficacy of this method.

A Comparison of K-Means and Mean Shift Algorithms

This paper compares and contrast two different clustering algorithms, the kmean and the mean shift, based on the following criteria: time complexity, training, prediction performance, and clustering algorithm accuracy.
...

References

SHOWING 1-10 OF 16 REFERENCES

On Grouping for Maximum Homogeneity

Abstract Given a set of arbitrary numbers, what is a practical procedure for grouping them so that the variance within groups is minimized? An answer to this question, including a description of an

Comparison of Experiments

1. Summary Bohnenblust, Shapley, and Sherman [2] have introduced a method of comparing two sampling procedures or experiments; essentially their concept is that one experiment a is more informative

Hierarchical Grouping to Optimize an Objective Function

Abstract A procedure for forming hierarchical groups of mutually exclusive subsets, each of which has members that are maximally similar with respect to specified characteristics, is suggested for

Note on Grouping

Abstract Suppose that it is required to condense observations of a variate into a small number of groups, the grouping intervals to be chosen to retain as much information as possible. One way of

A TCHEBYCHEFF-LIKE INEQUALITY FOR STOCHASTIC PROCESSES.

  • L. DubinsL. J. Savage
  • Mathematics
    Proceedings of the National Academy of Sciences of the United States of America
  • 1965
(X1 + .. . + Xn) 2> + (Aul + ...+ Ang) + a!(V1 + ...+ Vn) (1) is less than 1/(1 + aO3). This bound is sharp. Two lemmas, neither of which are difficult to verify, are used in the proof. The first of

Data analysis in the social sciences: what about the details?

  • G. Ball
  • Sociology
    AFIPS '65 (Fall, part I)
  • 1965
This paper attempts to demonstrate that there exists a class of techniques more suitably oriented toward the capabilities of the digital computer than are conventional analytic statistical techniques, and maintains that these techniques are capable of considering details in social sciences data, that is, relating the individuals described in the data.

Principles of numerical taxonomy

This new edition continues the story of psychology with added research and enhanced content from the most dynamic areas of the field--cognition, gender and diversity studies, neuroscience and more,

On uniform convergence of families of sequences of random variables

Remarks on the Economics of Information

Stochastic processes