• Corpus ID: 211677422

Statistical power for cluster analysis

@article{Dalmaijer2020StatisticalPF,
  title={Statistical power for cluster analysis},
  author={Edwin S. Dalmaijer and C. L. Nord and Duncan E. Astle},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.00381}
}
Cluster algorithms are gaining in popularity due to their compelling ability to identify discrete subgroups in data, and their increasing accessibility in mainstream programming languages and statistical software. While researchers can follow guidelines to choose the right algorithms, and to determine what constitutes convincing clustering, there are no firmly established ways of computing a priori statistical power for cluster analysis. Here, we take a simulation approach to estimate power and… 
Unsupervised KPIs-Based Clustering of Jobs in HPC Data Centers
TLDR
This paper concluded that those metrics (KPIs) related to the network (interface) traffic monitoring provided the best cohesion and separation to cluster HPC jobs, and hierarchical clustering algorithms were the most suitable for this task.
Cluster Analysis Revealed Two Hidden Phenotypes of Cluster Headache
TLDR
The cluster 1 phenotype may suggest a genetic or biology-based etiology, whereas the cluster two phenotype may be related to epigenetic mechanisms, possibly indicating different underlying genetic mechanisms.
Direct and indirect links between children’s socio-economic status and education: pathways via mental health, attitude, and cognition
A child’s socio-economic environment can profoundly affect their development. While existing literature focusses on simplified metrics and pair-wise relations between few variables, we aimed to
One size fits all? Segmenting consumers to predict sustainable fashion behavior
PurposeThis study segmented consumers by combining emotional and shopping characteristics to develop typologies that classify their consumption patterns and disposal
Transdiagnostic phenotypes of compulsive behavior and associations with psychological, cognitive, and neurobiological affective processing
TLDR
Although independent larger samples are needed to confirm the stability of subtypes, these data offer an integrated understanding of how different systems may interact in compulsive behavior and provide new considerations for guiding tailored intervention decisions.
Smoker profiles and their influence on smokers’ intention to use a digital decision aid aimed at the uptake of evidence-based smoking cessation tools: An explorative study
TLDR
The GDMS can be used to identify smokers who are interested in a digital DA early on and tailor recruitment and DA content, and indicates that cluster membership affected intention via socio-cognitive variables.
Therapies for Long COVID in non-hospitalised individuals: from symptoms, patient-reported outcomes and immunology to targeted therapies (The TLC Study)
TLDR
The symptom burden and underlying pathophysiology of Long COVID syndromes in non-hospitalised individuals and potential therapies are evaluated to establish the evidence base for appropriate therapies and recommend interventions for each newly characterised Long CO VID syndrome.
Inferring Energy Consumption Patterns in Public Buildings
The advent of smart meters have radically changed the mechanisms traditionally used for energy consumption monitoring. The possibility of having (i) highly frequent readings (even every minute); (ii)
...
1
2
...

References

SHOWING 1-10 OF 44 REFERENCES
How many clusters are best? - An experiment
  • R. Dubes
  • Computer Science
    Pattern Recognit.
  • 1987
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
TLDR
DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
Data clustering: a review
TLDR
An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Visualizing Data using t-SNE
TLDR
A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
A method for testing the distinctness of clusters: A test of the disjunction of two clusters in Euclidean space as measured by their overlap
A method is described for testing the distinctness of two clusters in Euclidean space. One first calculates the projections, q,of the N1and N2members of the clusters onto the line joining the cluster
Relative clustering validity criteria: A comparative overview
TLDR
An alternative, possibly complementary methodology for comparing clustering validity criteria is described and an extensive comparison of the performances of 40 criteria over a collection of 962,928 partitions derived from five well‐known clustering algorithms and 1080 different data sets of a given class of interest is made.
Relative clustering validity criteria: A comparative overview
TLDR
An alternative, possibly complementary methodology for comparing clustering validity criteria is described and an extensive comparison of the performances of 40 criteria over a collection of 962,928 partitions derived from five well-known clustering algorithms and 1080 different data sets of a given class of interest is made.
hdbscan: Hierarchical density based clustering
TLDR
HDBSCAN performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over ePSilon, which allows HDBSCAN to find clusters of varying densities, and be more robust to parameter selection.
Measuring the Power of Hierarchical Cluster Analysis
Abstract The concept of power for monotone invariant clustering procedures is developed via the possible partitions of objects at each iteration level in the obtained hierarchy. At a given level, the
An extensive comparative study of cluster validity indices
...
1
2
3
4
5
...