An Ensemble Framework of Voice-Based Emotion Recognition System for Films and TV Programs

@article{Tao2018AnEF,
  title={An Ensemble Framework of Voice-Based Emotion Recognition System for Films and TV Programs},
  author={Fei Tao and Gang Liu and Qingen Zhao},
  journal={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2018},
  pages={6209-6213}
}
  • Fei Tao, Gang Liu, Qingen Zhao
  • Published 3 March 2018
  • Computer Science, Engineering
  • 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Employing voice-based emotion recognition function in artificial intelligence (AI) product will improve the user experience. Most of researches that have been done only focus on the speech collected under controlled conditions. The scenarios evaluated in these research were well controlled. The conventional approach may fail when background noise or non-speech filler exist. In this paper, we propose an ensemble framework combining several aspects of features from audio. The framework… 
Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions
TLDR
This study uses emotion speech data to train regression-based speech enhancement models which are shown to be beneficial to noisy speech emotion recognition and adopts an LSTM architecture with a design of hidden layers via simply densely-connected progressive learning for the enhancement model.
Speech Emotion Recognition via Attention-based DNN from Multi-Task Learning
TLDR
A real-world large-scale corpus composed of 4 common emotions is constructed and a multi-task attention-based DNN model (i.e., MT-A-DNN) is proposed on the emotion learning, which efficiently learns the high-order dependency and non-linear correlations underlying in the audio data.
Speech Emotion Recognition Based on Selective Interpolation Synthetic Minority Over-Sampling Technique in Small Sample Environment
TLDR
This study proposes a speech emotion recognition model for a small sample environment based on selective interpolation synthetic minority over-sampling technique (SISMOTE), and feature selection method based on variance analysis and gradient boosting decision tree (GBDT) is introduced, which can exclude the redundant features that possess poor emotional representation.
Feature Extraction from Spectrums for Speech Emotion Recognition
A speech emotion recognition (SER) system is a collection of methods that process and classify speech signals to detect the embedded emotions. In this work, we will focus on the feature processing
Ensemble of Students Taught by Probabilistic Teachers to Improve Speech Emotion Recognition
TLDR
This study uses uncertainty modeling with Monte-Carlo (MC) dropout to create a distribution for the embeddings of an intermediate dense layer of the teacher, and uses this distribution to train an ensemble of students to improve the performance of the students in a semi-supervised manner.
MMTrans-MT: A Framework for Multimodal Emotion Recognition Using Multitask Learning
TLDR
In this work, MMTrans-MT (Multimodal Transformer-Multitask), the framework for multimodal emotion recognition using multitask learning is proposed, which has three modules: modalities representation module, multi-modalities fusion module, and multitask output module.
Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning
TLDR
A speaker adversarial loss is adopted in order to obtain speaker-independent linguistic representations using the recognizer and a generative adversarial network (GAN) loss is used to prevent the predicted features from being over-smoothed.
Multimodal Relational Tensor Network for Sentiment and Emotion Classification
TLDR
The architecture is presented, Relational Tensor Network, where the inter-modal interactions within a segment (intra-segment) are considered and the sequence of segments in a video is considered and this model outperforms many baselines and state of the art methods for sentiment classification and emotion recognition.
End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models
TLDR
This study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training, improving the accuracy and robustness of the proposed SAD system.
Spatio-Temporal Representation of an Electoencephalogram for Emotion Recognition Using a Three-Dimensional Convolutional Neural Network
TLDR
This paper proposes a novel method for recognizing an emotion based on the use of three-dimensional convolutional neural networks (3D CNNs), with an efficient representation of the spatio-temporal representations of EEG signals and demonstrates the accuracy of the emotional classification of the proposed method.
...
1
2
...

References

SHOWING 1-10 OF 25 REFERENCES
Fisher Kernels on Phase-Based Features for Speech Emotion Recognition
TLDR
This chapter proposes to use phase-based features to build up an emotion recognition system using Fisher kernels, and encodes the phase- based features by their deviation from a generative Gaussian mixture model.
Automatic speech emotion recognition using recurrent neural networks with local attention
TLDR
This work studies the use of deep learning to automatically discover emotionally relevant features from speech and proposes a novel strategy for feature pooling over time which uses local attention in order to focus on specific regions of a speech signal that are more emotionally salient.
MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016
TLDR
The baseline audio, visual features, and the recognition results by Random Forests are introduced.
DBN-ivector Framework for Acoustic Emotion Recognition
TLDR
This work proposes a framework based on deep belief network (DBN) and ivector space modeling for acoustic emotion recognition and shows significant improvement on both unweighted and weighted accuracy with decision level combination.
A Multi-Task Learning Framework for Emotion Recognition Using 2D Continuous Space
  • Rui Xia, Y. Liu
  • Computer Science, Psychology
    IEEE Transactions on Affective Computing
  • 2017
TLDR
The experimental results on the Interactive Emotional Dyadic Motion Capture and Sustained Emotionally Colored Machine-Human Interaction Using Nonverbal Expression databases show significant improvements on unweighted accuracy, illustrating the benefit of utilizing additional information in a multi-task learning setup for emotion recognition.
Hidden Markov model-based speech emotion recognition
TLDR
The paper addresses the design of working recognition engines and results achieved with respect to the alluded alternatives and describes a speech corpus consisting of acted and spontaneous emotion samples in German and English language.
On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues
TLDR
This work presents a novel approach to on-line emotion recognition from speech using Long Short-Term Memory Recurrent Neural Networks, which recognition is performed on low-level signal frames, similar to those used for speech recognition.
Emotion recognition based on phoneme classes
TLDR
It was found that (spectral properties of) vowel sounds were the best indicator to emotions in terms of the classification performance, and the best performance can be obtained by using phoneme-class classifiers over generic “emotional” HMM classifier and classifiers based on global prosodic features.
Bimodal Recurrent Neural Network for Audiovisual Voice Activity Detection
TLDR
A bimodal recurrent neural network (RNN) which combines audiovisual features in a principled, unified framework, capturing the timing dependency within modalities and across modalities is proposed.
Analysis of emotion recognition using facial expressions, speech and multimodal information
TLDR
Results reveal that the system based on facial expression gave better performance than the systembased on just acoustic information for the emotions considered, and that when these two modalities are fused, the performance and the robustness of the emotion recognition system improve measurably.
...
1
2
3
...