Automatic Lyrics-based Music Genre Classification in a Multilingual Setting


A large amount of research has been undertaken with regard to the classification of lyrics into genres, but most of this work has featured solely English lyrics. This study investigates the implications of classifying a multilingual database and the effectiveness of a number of techniques and algorithms for doing so. Part of this involves the creation of a high-quality dataset for use in this research. This paper finds that there are significant challenges in preprocessing multilingual text, and that traditional techniques like stemming and stop words may actually do more harm than good in such circumstances. It also finds that classes with strong language bias may be more likely to perform better than those with multiple languages.

Cite this paper

@inproceedings{Howard2011AutomaticLM, title={Automatic Lyrics-based Music Genre Classification in a Multilingual Setting}, author={Sam Howard and Colin G. Johnson}, year={2011} }