Algorithmic clustering of music

@article{Cilibrasi2004AlgorithmicCO,
  title={Algorithmic clustering of music},
  author={Rudi L. Cilibrasi and Paul M. B. Vit{\'a}nyi and Ronald de Wolf},
  journal={Proceedings of the Fourth International Conference onWeb Delivering of Music, 2004. EDELMUSIC 2004.},
  year={2004},
  pages={110-117}
}
We present a method for hierarchical music clustering, based on compression of strings that represent the music pieces. The method uses no background knowledge about music whatsoever: it is completely general and can, without change, be used in different areas like linguistic classification, literature, and genomics. Indeed, it can be used to simultaneously cluster objects from completely different domains, like with like. It is based on an ideal theory of the information content in individual… 

Figures from this paper

MIDI Music Genre Classification by Invariant Features
TLDR
This work combines techniques of selection and extraction of musically invariant features with classification distance similarity metric, which is an approximation of the theoretical, yet computational ly intractable, Kolmogorov complexity.
A simple genetic algorithm for music generation by means of algorithmic information theory
TLDR
The use of this distance as a fitness function which may be used by genetic algorithms to automatically generate music in a given pre-defined style is proposed and a simplified algorithm is developed that obtains interesting results.
Exploring Automated Music Genre Classification
TLDR
The results show that SVMs work the best out of all considered techniques, and the report suggests why this might be so, and offers suggestions on improving both clustering and classification techniques.
Computational Topology in Music Analysis
In “Dynamical and Topological Tools for (Modern) Music Analysis” by Bergomi [Ber15] various methods are described for the purposes of analysis and classification of music by its acoustic properties.
Using the Universal Similarity Metric to Model Artificial Creativity and Predict Human Listeners Response to Evolutionary Music
TLDR
This paper uses a k-Nearest Neighbor classifier to approximate the Information Distance between the new, unclassified, musical piece and a corpus of observed musical pieces rated by the user with the Universal Similarity Metric, and indicates that the universal similarity metric is a very general and versatile approach to modeling Artificial Creativity.
The similarity metric
TLDR
A new "normalized information distance" is proposed, based on the noncomputable notion of Kolmogorov complexity, and it is demonstrated that it is a metric and called the similarity metric.
Reducing the Loss of Information through Annealing Text Distortion
TLDR
It is shown how progressively removing words in such a way that the complexity of a document is slowly reduced helps the compression-based text clustering and improves its accuracy.
Using PCA and K-Means to Predict Likeable Songs from Playlist Information
TLDR
A collaborative filtering approach using two different models (K- means and hierarchical clustering) is used to separate playlist data into clusters for comparison and when tested on a small sample of users, the system recommended songs that were considered likeable by the users 60% of the time, while still finding Songs that were generally diverse.
Evolving computer-generated music by means of the normalized compression distance
TLDR
The superiority of the relative pitch envelope over other musical parameters, such as the lengths of the notes, has been confirmed, bringing us to develop a simplified algorithm that nevertheless obtains interesting results.
Clustering by compression
TLDR
A general mathematical theory of universal similarity is developed and tested on real-world applications in a wide range of fields, including the first completely automatic construction of the phylogeny tree based on whole mitochondrial genomes and a language tree for over 50 Euro-Asian languages.
...
...

References

SHOWING 1-10 OF 33 REFERENCES
Folk Music Classification Using Hidden Markov Models
TLDR
The work on the classification of folk music from different countries based on their monophonic melodies using hidden Markov models suggests to us a new way to think about musical style similarity.
Music style and author-ship categorization by informative compressors
Recently, a novel parameter based on the compressibility of an informative sequence was introduced. The best compression rate of a data sequence is related to the self-similarity of the sequence and
CLASSIFYING MUSIC BY GENRE USING THE WAVELET PACKET TRANSFORM AND A ROUND-ROBIN ENSEMBLE
TLDR
This paper presents a process for determining the music genre of an item using the Discrete Wavelet Transform and a round-robin classification technique that achieves very high classification accuracy.
Musical genre classification of audio signals
TLDR
The automatic classification of audio signals into an hierarchy of musical genres is explored and three feature sets for representing timbral texture, rhythmic content and pitch content are proposed.
The similarity metric
TLDR
A new "normalized information distance" is proposed, based on the noncomputable notion of Kolmogorov complexity, and it is demonstrated that it is a metric and called the similarity metric.
Using Machine-Learning Methods for Musical Style Modeling
TLDR
This research seeks to capture some of the regularity apparent in the composition process by using statistical and information theoretic tools to analyze musical pieces and generate new works that imitate the style of the great masters.
Query by humming: musical information retrieval in an audio database
TLDR
A system for querying an audio database by humming is described along with a scheme for representing the melodic information in a song as relative pitch changes, and the performance results of system indicating its effectiveness are presented.
A Machine Learning Approach to Musical Style Recognition
TLDR
This work demonstrates that machine learning can be used to build effective style classifiers for interactive performance systems and presents an analysis explaining why these techniques work so well when hand-coded approaches have consistently failed.
Language trees and zipping.
TLDR
A very general method for extracting information from a generic string of characters, e.g., a text, a DNA sequence, or a time series based on data-compression techniques, featuring highly accurate results for language recognition, authorship attribution, and language classification.
A Block-sorting Lossless Data Compression Algorithm
TLDR
A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.
...
...