Shape complexity in cluster analysis

@article{Aguilar2022ShapeCI,
  title={Shape complexity in cluster analysis},
  author={Eduardo Jes{\'u}s Aguilar and Valmir Carneiro Barbosa},
  journal={ArXiv},
  year={2022},
  volume={abs/2205.08046}
}
In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimension. Like the standard deviation, the great majority of scaling techniques can be said to have roots in some sort of statistical take on the data. Here… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 26 REFERENCES
Janus Point: A New Theory of Time
{m
TLDR
The master programme in Applied Geology aims to provide comprehensive knowledge based on various branches of Geology, with special focus on Applied geology subjects in the areas of Geomorphology, Structural geology, Hydrogeology, Petroleum Geologists, Mining Geology), Remote Sensing and Environmental geology.
Pooled variable scaling for cluster analysis
TLDR
This work proposes a new approach for scaling prior to cluster analysis based on the concept of pooled variance and uses this approach to cluster a high dimensional genomic dataset consisting of gene expression data for several specimens of breast cancer cells tissue obtained from human patients.
Shape Dynamics: Relativity and Relationalism
Introduction to linear and nonlinear programming
A study of standardization of variables in cluster analysis
TLDR
The present simulation study examined the standardization problem and found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure.
Weighted Standardization—A General Data Transformation Method Proceeding Classification Procedures
During preparatory steps of data for automatic classification routines, the amount of information contained by the character distribution is reduced by standardization of the character values. This
Optimal variable weighting for hierarchical clustering: An alternating least-squares algorithm
TLDR
A new methodology which simultaneously estimates in a least-squares fashion both an ultrametric tree and respective variable weightings for profile data that have been converted into (weighted) Euclidean distances is presented.
Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables
TLDR
A new method is proposed (SYNCLUS, SYNthesizedCLUStering) for dealing with the problem of how can the various contributory variables in a specific battery be weighted so as to enhance some cluster structure that may be present.
On comparing partitions
Rand (1971) proposed the Rand Index to measure the stability of two partitions of one set of units. Hubert and Arabie (1985) corrected the Rand Index for chance (Adjusted Rand Index). In this paper,
...
...