Clustering by compression

@article{Cilibrasi2005ClusteringBC,
  title={Clustering by compression},
  author={Rudi L. Cilibrasi and Paul M. B. Vit{\'a}nyi},
  journal={IEEE Transactions on Information Theory},
  year={2005},
  volume={51},
  pages={1523-1545}
}
We present a new method for clustering based on compression. The method does not use subject-specific features or background knowledge, and works as follows: First, we determine a parameter-free, universal, similarity distance, the normalized compression distance or NCD, computed from the lengths of compressed data files (singly and in pairwise concatenation). Second, we apply a hierarchical clustering method. The NCD is not restricted to a specific application area, and works across… 
Few-Shot Non-Parametric Learning with Deep Latent Variable Model
TLDR
Non-Parametric learning by Compression with Latent Variables (NPC-LV), a learning framework for any dataset with abundant unlabeled data but very few labeled ones, is proposed and illustrated that under NPC-LV, the improvement of generative models can enhance downstream classification accuracy.
Learning Models for Cyber-Physical Systems
The similarity metric
TLDR
A new "normalized information distance" is proposed, based on the noncomputable notion of Kolmogorov complexity, and it is demonstrated that it is a metric and called the similarity metric.
The Hiatus Between Organism and Machine Evolution: Contrasting Mixed Microbial Communities with Robots
Mixed microbial communities, usually composed of various bacterial and fungal species, are fundamental in a plethora of environments, from soil to human gut and skin. Their evolution is a
Discretization of Fractional Operators: Analysis by Means of Advanced Computational Techniques
TLDR
This paper studies the discretization of fractional operators by means of advanced clustering methods and visualization of the graphical representations allows a better understanding of the properties embedded in each type of approximation of the fractional operator.
Advances in the computational analysis of SARS-COV2 genome
TLDR
The results of the synergistic approach reveal the complex time dynamics of the evolutionary process and may help to clarify future directions of the SARS-CoV-2 evolution.
Characterization of Animal Movement Patterns using Information Theory: a Primer
TLDR
A series of non-parametric information-theoretic measures that can be used to derive new insights about animal behaviour with a specific focus on movement patterns, namely Shannon entropy, Mutual information, Kullback-Leibler divergence and Kolmogorov complexity are described.
Efficient DNA sequence compression with neural networks
TLDR
GeCo3, a new genomic sequence compressor that uses neural networks for mixing multiple context and substitution-tolerant context models, is created and benchmarked as a reference-free DNA compressor in 5 datasets.
Understanding COVID-19 nonlinear multi-scale dynamic spreading in Italy
The outbreak of COVID-19 in Italy took place in Lombardia, a densely populated and highly industrialized northern region, and spread across the northern and central part of Italy according to quite
Theoretical Computer Science: Computational Complexity
TLDR
An elegant tool for proofs of lower bounds for time/space complexity is a totally different notion of complexity: Kolmogorov complexity which measures the information contents.
...
...