The Role of Time in Music Emotion Recognition: Modeling Musical Emotions from Time-Varying Music Features
Emotional content is a major component in music. It has long been a research topic of interest to discover the acoustic patterns in the music that carry that emotional information, and enable performers to communicate emotional messages to listeners. Previous works looked in the audio signal for local cues, most of which assume monophonic music, and their statistics over time. Here, we used generic audio features, that can be calculated for any audio signal, and focused on the progression of these features through time, investigating how informative the dynamics of the audio is for emotional content. Our data is comprised of piano and vocal improvisations of musically trained performers, instructed to convey 4 categorical emotions. We applied Dynamic Texture Mixture (DTM), that models both the instantaneous sound qualities and their dynamics, and demonstrated the strength of the model. We further showed that once taking the dynamics into account even highly reduced versions of the generic audio features carry a substantial amount of information about the emotional content. Finally, we demonstrate how interpreting the parameters of the trained models can yield interesting cognitive suggestions.