A Probability Theory of Cluster Analysis

  title={A Probability Theory of Cluster Analysis},
  author={Robert F. Ling},
  journal={Journal of the American Statistical Association},
  • R. F. Ling
  • Published 1 March 1973
  • Mathematics
  • Journal of the American Statistical Association
Abstract An explicit definition of a (k, r)-cluster is proposed. Each (k, r)-cluster has the property that each of its elements is within a distance r of at least k other elements of the same cluster and the entire set can be linked by a chain of links each less than or equal to r. Some exact distributional results are derived under a nonmetric hypothesis for the case k = 1. An example is given to illustrate the use of probability theory in identifying significant clustering structures in terms… 
Measuring the Power of Hierarchical Cluster Analysis
Abstract The concept of power for monotone invariant clustering procedures is developed via the possible partitions of objects at each iteration level in the obtained hierarchy. At a given level, the
Some applications of graph theory to clustering
Several graphtheoretic criteria are proposed for use within a general clustering paradigm as a means of developing procedures “in between” the extremes of complete-link and single-link hierarchical partitioning.
A Hybrid Clustering Algorithm for Identifying High Density Clusters
A hybrid algorithm is proposed which combines elements of both the k-means and single linkage techniques, and is shown to be consistent, under certain regularity conditions, in one dimension.
Probability Tables for Cluster Analysis Based on a Theory of Random Graphs
Abstract Statistics based on a theory of random graphs have been proposed as an analytic aid to assess the randomness of a clustered structure. Probability tables for two such statistics are
Stability of a hierarchical clustering
Statistical theory in clustering
A number of statistical models for forming and evaluating clusters are reviewed. Hierarchical algorithms are evaluated by their ability to discover high density regions in a population, and complete
Cluster Definition by the Optimization of Simple Measures
  • T.A. Bailey, J. Cowles
  • Mathematics, Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 1984
A pruned search tree algorithm is developed which is much faster than complete search, especially for graphs which are derived from points embedded in a space of low dimensionality.


Hierarchical clustering schemes
A useful correspondence is developed between any hierarchical system of such clusters, and a particular type of distance measure, that gives rise to two methods of clustering that are computationally rapid and invariant under monotonic transformations of the data.
The key defect in almost all clustering procedures seems to be the absence of a statistical model, and the suggestion is made that the clustering problem be stated as a mixture problem.
The construction technique is applied to voting behaviour of the 50 United States in the last 13 presidential elections, giving a tree clustering of the states.
Minimum Spanning Trees and Single Linkage Cluster Analysis
Minimum spanning trees (MST) and single linkage cluster analysis (SLCA) are explained and it is shown that all the information required for the SLCA of a set of points is contained in their MST.
On Some Clustering Techniques
  • R. Bonner
  • Computer Science
    IBM J. Res. Dev.
  • 1964
A number of methods which make use of IBM 7090 computer programs to do clustering are described, and a medical research problem is used to illustrate and compare these methods.
Some Properties of Pascal Distribution for Finite Population
Abstract This note deals with a waiting time problem. The probability distribution of the number of independent random drawings, if the probability of success at a single drawing is constant, is well
The application of computers to taxonomy.
  • P. Sneath
  • Computer Science
    Journal of general microbiology
  • 1957
SUMMARY: A method is described for handling large quantities of taxonomic data by an electronic computer so as to yield the outline of a classification based on equally weighted features. This
A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems
It is shown that the computational behaviour of a hierarchical sorting-strategy depends on three properties, which are established for five conventional strategies and four measures. The conventional