Clustering by compression

  author={Rudi L. Cilibrasi and Paul M. B. Vit{\'a}nyi},
  journal={IEEE Transactions on Information Theory},
We present a new method for clustering based on compression. The method does not use subject-specific features or background knowledge, and works as follows: First, we determine a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is not restricted to a specific application area, and works across… 
