Multimodal Music Mood Classification Using Audio and Lyrics

  title={Multimodal Music Mood Classification Using Audio and Lyrics},
  author={Cyril Laurier and Jens Grivolla and Perfecto Herrera},
  journal={2008 Seventh International Conference on Machine Learning and Applications},
In this paper we present a study on music mood classification using audio and lyrics information. [] Key Method We show that standard distance-based methods and latent semantic analysis are able to classify the lyrics significantly better than random, but the performance is still quite inferior to that of audio-based techniques. We then introduce a method based on differences between language models that gives performances closer to audio-based classifiers.

Figures and Tables from this paper

A framework for evaluating multimodal music mood classification
Experimental results on a large data set of 18 mood categories show that combining lyrics and audio significantly outperformed systems using audio‐only features and automatic feature selection techniques were further proved to have reduced feature space.
Lyric Text Mining in Music Mood Classification
Findings show patterns at odds with findings in previous studies: audio features do not always outperform lyrics features, and combining lyrics and audio features can improve performance in many mood categories, but not all of them.
Automatic music mood classification by learning cross-media relevance between audio and lyrics
This paper proposes a generative multimodal method for automatically classifying the mood of a piece of music based on effective learning of the relevance of the joint distribution between the audio and the lyrics modalities of music.
When Lyrics Outperform Audio for Music Mood Classification: A Feature Analysis
Analysis of the significant lyric feature types indicates a strong and obvious semantic association between extracted terms and the categories and no such obvious semantic linkages were evident in the case where audio spectral features proved superior.
Improving music mood classification using lyrics, audio and social tags
This dissertation research aims to identify mood categories that are frequently used by real-world music listeners, through an empirical investigation of real-life social tags applied to music, and advance the technology in automatic music mood classification by a thorough investigation on lyric text analysis and the combination of lyrics and audio.
Multimodal Music Mood Classification by Fusion of Audio and Lyrics
A novel multimodal approach for music mood classification incorporating audio and lyric information, which consists of three key components: a Hough forest based fusion and classification scheme that fuses two modalities at the more fine-grained sentence level, utilizing the time alignment cross modalities.
Multimodal Mood Classification Framework for Hindi Songs
A mood taxonomy is proposed and the framework for developing a multimodal dataset (audio and lyrics) for Hindi songs is described, which consists of three different systems based on the features of audio, lyrics and both.
Unsupervised Approach to Hindi Music Mood Classification
In the present task, an unsupervised classifier for Hindi music mood classification is built using different audio related features like rhythm, timber and intensity.
Music Emotion Recognition from Lyrics: A Comparative Study
A study on music emotion recognition from lyrics and builds classifiers for the different datasets, comparing different algorithms and using feature selection to combine the best feature sets of audio and lyrics.
Multimodal Mood Classification - A Case Study of Differences in Hindi and Western Songs
The mood taxonomy is identified and multimodal mood annotated datasets for Hindi and Western songs are prepared and important audio and lyric features are identified using correlation based feature selection technique.


Integration of Text and Audio Features for Genre Classification in Music Information Retrieval
The nature of text and audio feature sets which describe the same audio tracks are explained and the use of textual data on top of low level audio features for music genre classification is proposed.
A Demonstrator for Automatic Music Mood Estimation
While a subjective evaluation of this algorithm on arbitrary music is ongoing, the initial classification results are encouraging and suggest that an automatic predicition of music mood is possible.
Natural language processing of lyrics
Preliminary results on language identification, structure extraction, categorization and similarity searches suggests that a lot of profit can be gained from the analysis of lyrics, which complements that of acoustic and cultural metadata and is fundamental for the development of complete music information retrieval systems.
Semantic analysis of song lyrics
  • B. LoganA. KositskyP. Moreno
  • Computer Science
    2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763)
  • 2004
It is found lyrics can be used to discover natural genre clusters and is better than random, but inferior to a state-of-the-art acoustic similarity technique.
Singing in the Brain: Independence of Lyrics and Tunes
Why is vocal music the oldest and still the most popular form of music? Very possibly because vocal music involves an intimate combination of speech and music, two of the most specific, high-level
Emotions from Text: Machine Learning for Text-based Emotion Prediction
This paper explores the text-based emotion prediction problem empirically, using supervised machine learning with the SNoW learning architecture to classify the emotional affinity of sentences in the narrative domain of children's fairy tales, for subsequent usage in appropriate expressive rendering of text-to-speech synthesis.
Support vector machine active learning for music retrieval
In comparing a number of representations for songs, the statistics of mel-frequency cepstral coefficients to perform best in precision-at-20 comparisons and it is shown that by choosing training examples intelligently, active learning requires half as many labeled examples to achieve the same accuracy as a standard scheme.
Expression, Perception, and Induction of Musical Emotions: A Review and a Questionnaire Study of Everyday Listening
In this article, we provide an up-to-date overview of theory and research concerning expression, perception, and induction of emotion in music. We also provide a critique of this research, noting
MIR in Matlab (II): A Toolbox for Musical Feature Extraction from Audio
An overview of the set of features, related, among others, to timbre, tonality, rhythm or form, that can be extracted with the MIRtoolbox, an integrated set of functions written in Matlab dedicated to the extraction of musical features from audio files.
Music perception.
  • D. Deutsch
  • Art
    Frontiers in bioscience : a journal and virtual library
  • 2007
It is shown that, for certain types of configuration, the music as it is perceived can differ substantially from the music that is notated in the score, or as might be imagined from reading the score.