Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels (Extended abstract)

  title={Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels (Extended abstract)},
  author={Chung-Hsien Wu and Wei-Bin Liang},
  journal={2015 International Conference on Affective Computing and Intelligent Interaction (ACII)},
  • Chung-Hsien Wu, Wei-Bin Liang
  • Published 2015
  • Computer Science
  • 2015 International Conference on Affective Computing and Intelligent Interaction (ACII)
This work presents an approach to emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information (AP) and semantic labels (SLs). For AP-based recognition, acoustic and prosodic features are extracted from the detected emotional salient segments of the input speech. Three types of models GMMs, SVMs, and MLPs are adopted as the base-level classifiers. A Meta Decision Tree (MDT) is then employed for classifier fusion to obtain the AP-based emotion… 

Figures and Tables from this paper

A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances
This systematic overview of affective computing systematically review recent advances, survey and taxonomize state-of-the-art unimodal affects recognition and multimodal affective analysis in terms of their detailed architectures and performances, and concludes with an indication of the most promising future directions.
Speech emotion recognition with unsupervised feature learning
Several unsupervised feature learning algorithms, including K-means clustering, the sparse auto-encoder, and sparse restricted Boltzmann machines, which have promise for learning task-related features by using unlabeled data, are applied to speech emotion recognition.
Sentiment analysis using Hierarchical Multimodal Fusion (HMF)
The sentiment analysis of videos is investigated, with data available in three modalities: audio, video, and text, with a novel approach to speaker-independent fusion: utilizing deep learning to fuse in a hierarchical fashion.
Speech emotion recognition using data augmentation
A deep learning model for speech emotion recognition with GRU which take the filterbank energies of the speech signals as input and to overcome the problem with the availability of database and to increase the number of input samples, the author has applied data augmentation.


Combining spectral and prosodic information for emotion recognition in the interspeech 2009 emotion challenge
This paper describes the system presented at the Interspeech 2009 Emotion Challenge, a fusion system based on Support Vector Machines that relies on both spectral and prosodic features in order to automatically detect the emotional state of the speaker.
Passion and reason : making sense of our emotions
Partial contents: What this book is about Part 1 Portraits of the Individual Emotions Part 2 How to Understand the Emotions Part 3 Practical Implications Final Thoughts
Probabilities for SV Machines
This chapter contains sections titled: Introduction, Fitting a Sigmoid After the SVM, Empirical Tests, Conclusions, Appendix: Pseudo-code for the Sigmoid Training
Pattern recognition : a statistical approach
Emotion Perception and Recognition from Speech
This chapter begins by introducing the correlations between basic speech features such as pitch, intensity, formants, MFCC, and so on, and the emotions, and several recognition methods are described to illustrate the performance of the previously proposed models.
Two-Level Fusion to Improve Emotion Classification in Spoken Dialogue Systems
The results show that the first fusion module significantly increases the classification rates of a baseline and the classifiers working separately, as has been observed previously in the literature.
Speech act modeling and verification of spontaneous speech with disfluency in a spoken dialogue system
According to this approach, semantic information, syntactic structure and fragment class of an input utterance are statistically encapsulated in a proposed speech act hidden Markov model (SAHMM) to characterize the speech act to alleviate the disfluency problem in spontaneous speech.