FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition

  title={FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition},
  author={Bonaventure F. P. Dossou and Yeno K. S. Gbenou},
  journal={2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
Using mel-spectrograms over conventional MFCCs features, we assess the abilities of convolutional neural networks to accurately recognize and classify emotions from speech data. We introduce FSER, a speech emotion recognition model trained on four valid speech databases, achieving a high-classification accuracy of 95,05%, over 8 different emotion classes: anger, anxiety, calm, disgust, happiness, neutral, sadness, surprise. On each benchmark dataset, FSER outperforms the best models introduced… Expand

Figures and Tables from this paper


Convolutional Neural Network (CNN) Based Speech-Emotion Recognition
This paper presents a unique Convolutional Neural Network (CNN) based speech-emotion recognition system that will be influential in developing conversational and social robots and allocating all the nuances of their sentiments. Expand
Emotion Recognition from Speech
This work considers emotion recognition from speech in the wider sense of application in Companion-systems, where acted and naturalistic spoken data has to be available in operational form (corpora) for the development of emotion classification. Expand
Sound frequency affects speech emotion perception: results from congenital amusia
Testing the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech suggests an influence of low frequency information in identifying emotional content of speech. Expand
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English
The RAVDESS is a validated multimodal database of emotional speech and song consisting of 24 professional actors, vocalizing lexically-matched statements in a neutral North American accent, which shows high levels of emotional validity and test-retest intrarater reliability. Expand
Perception of the Emotional Content of Speech
White and Indian adult males, speaking whatever words came to mind in their native languages (English and Cree, respectively), attempted to express vocally the emotions of happiness, sadness, love,Expand
Understanding Data Augmentation for Classification: When to Warp?
It is found that while it is possible to perform generic augmentation in feature-space, if plausible transforms for the data are known then augmentationIn data-space provides a greater benefit for improving performance and reducing overfitting. Expand
[Perception of emotions in speech. A review of psychological and physiological research].
The article is a review of the general concepts and approaches in research of recognition of emotions in speech: psychological concepts, principles and methods of study and physiological data inExpand
Personality, Emotion, Psychopathology and Speech
Publisher Summary This chapter explains the concept of personality, emotion, psychopathology, and speech. Students of language have not been too successful in avoiding this danger and have succumbedExpand
Max-Pooling Dropout for Regularization of Convolutional Neural Networks
This paper demonstrates that max-pooling dropout is equivalent to randomly picking activation based on a multinomial distribution at training time, and advocates employing the proposed probabilistic weighted pooling, instead of commonly used max- Pooling, to act as model averaging at test time. Expand
The Relationship Between the Inner Speech and Emotions: Revisiting the Study of Passions in Psychology
This article tries to deepen in the Vygotsky’s interfunctional analysis between the affections and the thought, with the purpose of finding the unit of analysis that captures the intersection betweenExpand