Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition

@inproceedings{Zhang2016MultimodalDC,
  title={Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition},
  author={Shiqing Zhang and Shiliang Zhang and Tiejun Huang and Wen Gao},
  booktitle={ICMR},
  year={2016}
}
Emotion recognition is a challenging task because of the emotional gap between subjective emotion and the low-level audio-visual features. Inspired by the recent success of deep learning in bridging the semantic gap, this paper proposes to bridge the emotional gap based on a multimodal Deep Convolution Neural Network (DCNN), which fuses the audio and visual cues in a deep model. This multimodal DCNN is trained with two stages. First, two DCNN models pre-trained on large-scale image data are… CONTINUE READING

Citations

Publications citing this paper.
SHOWING 1-10 OF 25 CITATIONS

Learning Affective Features With a Hybrid Deep Model for Audio–Visual Emotion Recognition

  • IEEE Transactions on Circuits and Systems for Video Technology
  • 2018
VIEW 10 EXCERPTS
CITES BACKGROUND, METHODS & RESULTS

A Combined Reinforcement Regression Model Based on Weighted Feedback for Multimodal Emotion Recognition

  • 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA)
  • 2019
VIEW 2 EXCERPTS
CITES BACKGROUND

An End-to-End Multimodal Voice Activity Detection Using WaveNet Encoder and Residual Networks

  • IEEE Journal of Selected Topics in Signal Processing
  • 2019
VIEW 1 EXCERPT
CITES METHODS

A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization

  • 2018 International Joint Conference on Neural Networks (IJCNN)
  • 2018
VIEW 2 EXCERPTS
CITES BACKGROUND

References

Publications referenced by this paper.

Recognizing Human Emotional State From Audiovisual Signals

  • IEEE Transactions on Multimedia
  • 2008
VIEW 4 EXCERPTS
HIGHLY INFLUENTIAL