Song Emotion Recognition: a Performance Comparison Between Audio Features and Artificial Neural Networks

  title={Song Emotion Recognition: a Performance Comparison Between Audio Features and Artificial Neural Networks},
  author={Karen Rosero and Arthur Nicholas dos Santos and Pedro Benevenuto Valadares and Bruno S. Masiero},
When songs are composed or performed, there is often an intent by the singer/songwriter of expressing feelings or emotions through it. For humans, matching the emotiveness in a musical composition or performance with the subjective perception of an audience can be quite challenging. Fortunately, the machine learning approach for this problem is simpler. Usually, it takes a data-set, from which audio features are extracted to present this information to a data-driven model, that will, in turn… 

Figures and Tables from this paper



Song Emotion Recognition: A Study of the State of the Art

The most common features and models used in recent publications to tackle music emotion recognition are studied, revealing which ones are best suited for songs (particularly acapella).


This paper surveys the state of the art in automatic emotion recognition in music. Music is oftentimes referred to as a “language of emotion” [1], and it is natural for us to categorize music in

Emotional classification of music using neural networks with the MediaEval dataset

This work presents an automatic system of emotional classification of music by implementing a neural network based on a previous implementation of a dimensional emotional prediction system in which a multilayer perceptron (MLP) was trained with the freely available MediaEval database.

Dynamic Music emotion recognition based on CNN-BiLSTM

The dynamic emotion recognition of music, using the VA model, a song to produce a set of VA value is described and then compared with other relevant methods of identification.

Cross-Dataset Music Emotion Recognition: an End-to-End Approach

A novel approach to to-wards emotion detection is tested and a language sensitive end-to-end model that learns to tag emotions from music with lyrics in English, Mandarin and Turkish is proposed.

High-Level Analysis of Audio Features for Identifying Emotional Valence in Human Singing

Established audio analysis features are applied to determine if these can be used to detect underlying emotional valence in human singing and indicate that the short-term audio features can be useful predictors of emotion, although their efficacy is not consistent across positive and negative emotions.

A Novel Music Emotion Recognition Model for Scratch-generated Music

This paper proposes a novel music emotion recognition model for Scratch-generated music, where the CNN module can learns the important features of music while RNN can learn the sequential features.

A Multilingual Framework of CNN and Bi-LSTM for Emotion Classification

  • Ashima YadavD. Vishwakarma
  • Computer Science
    2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT)
  • 2020
This paper proposes a language-independent, deep learning-based framework for speech emotion classification by developing a unique combination of 1D CNN and Bi-LSTM units that can extract the local information from the signals with the help of CNN, and the Bi- LSTM layer helps in modeling the long-term contextual dependencies of the signal.