Algorithmic clustering of music

  title={Algorithmic clustering of music},
  author={Rudi L. Cilibrasi and Paul M. B. Vit{\'a}nyi and Ronald de Wolf},
  journal={Proceedings of the Fourth International Conference onWeb Delivering of Music, 2004. EDELMUSIC 2004.},
We present a method for hierarchical music clustering, based on compression of strings that represent the music pieces. The method uses no background knowledge about music whatsoever: it is completely general and can, without change, be used in different areas like linguistic classification, literature, and genomics. Indeed, it can be used to simultaneously cluster objects from completely different domains, like with like. It is based on an ideal theory of the information content in individual… 

Figures from this paper

MIDI Music Genre Classification by Invariant Features
This work combines techniques of selection and extraction of musically invariant features with classification distance similarity metric, which is an approximation of the theoretical, yet computational ly intractable, Kolmogorov complexity.
A simple genetic algorithm for music generation by means of algorithmic information theory
The use of this distance as a fitness function which may be used by genetic algorithms to automatically generate music in a given pre-defined style is proposed and a simplified algorithm is developed that obtains interesting results.
Exploring Automated Music Genre Classification
The results show that SVMs work the best out of all considered techniques, and the report suggests why this might be so, and offers suggestions on improving both clustering and classification techniques.
Computational Topology in Music Analysis
In “Dynamical and Topological Tools for (Modern) Music Analysis” by Bergomi [Ber15] various methods are described for the purposes of analysis and classification of music by its acoustic properties.
Reducing the Loss of Information through Annealing Text Distortion
It is shown how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy.
Using PCA and K-Means to Predict Likeable Songs from Playlist Information
A collaborative filtering approach using two different models (K- means and hierarchical clustering) is used to separate playlist data into clusters for comparison and when tested on a small sample of users, the system recommended songs that were considered likeable by the users 60% of the time, while still finding Songs that were generally diverse.
Evolving computer-generated music by means of the normalized compression distance
The superiority of the relative pitch envelope over other musical parameters, such as the lengths of the notes, has been confirmed, bringing us to develop a simplified algorithm that nevertheless obtains interesting results.
Clustering by compression
A general mathematical theory of universal similarity is developed and tested on real-world applications in a wide range of fields, including the first completely automatic construction of the phylogeny tree based on whole mitochondrial genomes and a language tree for over 50 Euro-Asian languages.
Testing genetic algorithm recombination strategies and the normalized compression distance for computer-generated music
The use of the normalized compression distance as a fitness function for the automatic generation of music by means of genetic algorithms is analyzed, and the effect on performance of several genetic recombination procedures is tested.
On clustering Romance languages
This work investigates the similarity of Romance languages based on the syllables excerpted from the representative vocabularies of seven Romance languages to find out if the Romance languages form ”natural” clusters that can be labelled in a meaningful manner.


Folk Music Classification Using Hidden Markov Models
The work on the classification of folk music from different countries based on their monophonic melodies using hidden Markov models suggests to us a new way to think about musical style similarity.
Music style and author-ship categorization by informative compressors
Recently, a novel parameter based on the compressibility of an informative sequence was introduced. The best compression rate of a data sequence is related to the self-similarity of the sequence and
This paper presents a process for determining the music genre of an item using the Discrete Wavelet Transform and a round-robin classification technique that achieves very high classification accuracy.
Musical genre classification of audio signals
The automatic classification of audio signals into an hierarchy of musical genres is explored and three feature sets for representing timbral texture, rhythmic content and pitch content are proposed.
Using Machine-Learning Methods for Musical Style Modeling
This research seeks to capture some of the regularity apparent in the composition process by using statistical and information theoretic tools to analyze musical pieces and generate new works that imitate the style of the great masters.
Query by humming: musical information retrieval in an audio database
A system for querying an audio database by humming is described along with a scheme for representing the melodic information in a song as relative pitch changes, and the performance results of system indicating its effectiveness are presented.
A Machine Learning Approach to Musical Style Recognition
This work demonstrates that machine learning can be used to build effective style classifiers for interactive performance systems and presents an analysis explaining why these techniques work so well when hand-coded approaches have consistently failed.
Language trees and zipping.
A very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series based on data-compression techniques, featuring highly accurate results for language recognition, authorship attribution, and language classification.
A Block-sorting Lossless Data Compression Algorithm
A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.
Algorithm makes tongue tree
Researchers in Italy have developed a program that can spot enough subtle differences between two authors' works to attribute authorship 1 and help a computer tell Dante from Machiavelli.