EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation

  title={EMOPIA: A Multi-Modal Pop Piano Dataset For Emotion Recognition and Emotion-based Music Generation},
  author={Hsiao-Tzu Hung and Joann Ching and Seungheon Doh and Nabin Kim and Juhan Nam and Yi-Hsuan Yang},
While there are many music datasets with emotion labels in the literature, they cannot be used for research on symbolic-domain music analysis or generation, as there are usually audio files only. In this paper, we present the EMOPIA (pronounced `yee-mo-pi-uh') dataset, a shared multi-modal (audio and MIDI) database focusing on perceived emotion in pop piano music, to facilitate research on various tasks related to music emotion. The dataset contains 1,087 music clips from 387 songs and clip… 

Figures and Tables from this paper

A Novel Multi-Task Learning Method for Symbolic Music Emotion Recognition

A simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks derived from the intrinsic structure of the music, is presented.

YM2413-MDB: A Multi-Instrumental FM Video Game Music Dataset with Emotion Annotations

The baseline models and results for emotion recognition and emotion-conditioned symbolic music generation using YM2413-MDB, an 80s FM video game music dataset with multi-label emotion annotations, are provided.

Emotional Quality Evaluation for Generated Music Based on Emotion Recognition Model

An emotional quality evaluation method for generated music from the perspective of music emotion recognition, constructed based on residual convolutional network to predict the emotion category of generated music.

Symbolic music generation conditioned on continuous-valued emotions

The proposed approaches outperform conditioning using control tokens which is representative of the current state of the art and provide a new large-scale dataset of symbolic music paired with emotion labels in terms of valence and arousal.

Music generation based on emotional EEG

In this method, sequence to sequence long-short term memory is utilized to train the emotional music to obtain emotional music generators, and support vector machine is used to get emotional information.

A Continuous Emotional Music Generation System Based on Facial Expressions

The novelty of the approach is the joint prediction of discrete and continuous emotions through facial expression recognition networks, and then the music generation model is driven by the continuous emotion vector to generate music corresponding to the predicted facial emotion.

Using Deep Learning to Recognize Therapeutic Effects of Music Based on Emotions

The experiment aims to create a machine-learning model that can predict whether a specific song has therapeutic effects on a specific person and considers a person’s musical and emotional characteristics but is also trained to consider solfeggio frequencies.

Learning To Generate Piano Music With Sustain Pedals

This work employs the transcription model proposed by Kong et al. to get pedal information from the audio recordings of piano performance in the AILabs1k7 dataset, and modifications the Compound Word Transformer proposed by Hsiao etAl.

ComMU: Dataset for Combinatorial Music Generation

The results show that the new combinatorial music generation task, a new task to create varying background music based on given conditions, can generate diverse high-quality music only with metadata, and that the unique metadata such as track-role and extended chord quality improves the capacity of the automatic composition.

MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding

An attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model for tackling a number of symbolic-domain discriminative music understanding tasks, finding that, given a pretrained Transformer, the models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning.



Ranking-Based Emotion Recognition for Experimental Music

This study presents a crowdsourcing method that is used to collect ground truth via ranking the valence and arousal of music clips, and proposes a smoothed RankSVM (SRSVM) method that outperforms four other ranking algorithms.

Multi-Modal Music Emotion Recognition: A New Dataset, Methodology and Comparative Analysis

A methodology for the automatic creation of a multi-modal music emotion dataset resorting to the AllMusic database, based on the emotion tags used in the MIREX Mood Classification Task is introduced.

1000 songs for emotional analysis of music

This work presents a new publicly available dataset for music emotion recognition research and a baseline system, consisting entirely of creative commons music from the Free Music Archive, which can be shared freely without penalty.

The multiple voices of musical emotions: source separation for improving music emotion recognition models and their interpretability

A new computational model (EmoMucs) is proposed that considers the role of different musical voices in the prediction of the emotions induced by music and outperforms state-of-the-art approaches with the advantage of providing insights into the relative contribution of different music elements to the emotions perceived by listeners.

Exploration of Music Emotion Recognition Based on MIDI

It is found that melody was more important to valence regression than accompaniment, which was in contrary to arousal, and the chorus part of an edited MIDI might contain as sufficient information as the entire edited MIDI forValence regression.

Audio Features for Music Emotion Recognition: A Survey

This article presents a survey on the existing emotionally-relevant computational audio features, supported by the music psychology literature on the relations between eight musical dimensions and specific emotions.

Music Emotion Recognition

A computational framework that generalizes emotion recognition from the categorical domain to real-valued 2D space is presented and techniques for addressing the issues related to: the ambiguity and granularity of emotion description, heavy cognitive load of emotion annotation, subjectivity of emotion perception, and the semantic gap between low-level audio signal and high-level emotion perception are detailed.

Emo-soundscapes: A dataset for soundscape emotion recognition

A dataset of audio samples called Emo-Soundscapes and two evaluation protocols for machine learning models to benchmark SER are proposed and how the mixing of various soundscape recordings influences their perceived emotion is studied.

Joyful for you and tender for us: the influence of individual characteristics and language on emotion labeling and classification

The results suggest that a) applying a broader categorization of taxonomies and b) using multi-label, group-based annotations based on language, can be beneficial for MER models.

SentiMozart: Music Generation based on Emotions

The aim of the proposed framework is to generate music corresponding to the emotion of the person predicted by the model, which is essentially a Doubly Stacked LSTM architecture.