• Corpus ID: 233481524

Emotion Recognition of the Singing Voice: Toward a Real-Time Analysis Tool for Singers

  title={Emotion Recognition of the Singing Voice: Toward a Real-Time Analysis Tool for Singers},
  author={Daniel Szelogowski},
Current computational-emotion research has focused on applying acoustic properties to analyze how emotions are perceived mathematically or used in natural language processing machine learning models. While recent interest has focused on analyzing emotions from the spoken voice, little experimentation has been performed to discover how emotions are recognized in the singing voice – both in noiseless and noisy data (i.e., data that is either inaccurate, difficult to interpret, has corrupted… 


Identifying Emotions in Opera Singing: Implications of Adverse Acoustic Conditions
The findings show that the three noises affect similarly female and male singers and that listeners’ gender did not play a role, and the performance of state-of-the-art automatic recognition methods was evaluated.
Emotions Understanding Model from Spoken Language using Deep Neural Networks and Mel-Frequency Cepstral Coefficients
This work is presenting a classification model of emotions elicited by speeches based on deep neural networks (CNNs) which has been trained to classify eight different emotions which correspond to the ones proposed by Ekman plus the neutral and calm ones.
Emotion in the singing voice—a deeperlook at acoustic features in the light ofautomatic classification
A small set of relevant acoustic features are proposed basing on previous findings on the same data and compared with a large-scale state-of-the-art feature set for paralinguistics recognition, the baseline feature set of the Interspeech 2013 Computational Paraleduistics ChallengE (ComParE).
Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy
A unique neural network approach inspired by a technique that has revolutionized the field of vision: pixel-wise image classification is presented, which is combined with cross entropy loss and pretraining of the CNN as an autoencoder on singing voice spectrograms.
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English
The RAVDESS is a validated multimodal database of emotional speech and song consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent, which shows high levels of emotional validity and test-retest intrarater reliability.
Vocal separation using nearest neighbours and median filtering
A novel vocal separator inspired by single channel vocal separation algorithms which finds the k nearest neighbours to each frame of a spectrogram of the mixture signal which is then used as the estimate of the background music at the current frame.
Music/Voice Separation Using the Similarity Matrix
Evaluation on a data set of 14 full-track real-world pop songs showed that use of a similarity matrix can overall improve on the separation performance compared with a previous repetition-based source separation method, and a recent competitive music/voice separation methods, while still being computationally efficient.
HMM-Based Audio Keyword Generation
Experimental results show that, for audio keyword generation, the proposed HMM-based method outperforms the previous hierarchical SVM, and is motivated by the successful story of HMM in speech recognition.
Speech emotion recognition
20 years of progress in making machines hear the authors' emotions based on speech signal properties is traced, with a focus on artificial intelligence and machine learning.
Emotion and Empathy: How Voice Can Save the Culture
I began writing this column in early November of 2016, with less than a week to go before the elections in the United States of America. When it was finished, the election was over and had culminated