Summary of MuSe 2020: Multimodal Sentiment Analysis, Emotion-target Engagement and Trustworthiness Detection in Real-life Media

@article{Stappen2020SummaryOM,
  title={Summary of MuSe 2020: Multimodal Sentiment Analysis, Emotion-target Engagement and Trustworthiness Detection in Real-life Media},
  author={Lukas Stappen and Bj{\"o}rn Schuller and Iulia Lefter and E. Cambria and Yiannis Kompatsiaris},
  journal={Proceedings of the 28th ACM International Conference on Multimedia},
  year={2020}
}
The first Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020 was a Challenge-based Workshop held in conjunction with ACM Multimedia'20. It addresses three distinct 'in-the-wild` Sub-challenges: sentiment/ emotion recognition (MuSe-Wild), emotion-target engagement (MuSe-Target) and trustworthiness detection (MuSe-Trust). A large multimedia dataset MuSe-CaR was used, which was specifically designed with the intention of improving machine understanding approaches of how sentiment (e.g… 
MuSe 2021 Challenge: Multimodal Emotion, Sentiment, Physiological-Emotion, and Stress Detection
TLDR
The motivation, the sub-challenges, the challenge conditions, the participation, the Participation, and the most successful approaches are described.
The MuSe 2021 Multimodal Sentiment Analysis Challenge: Sentiment, Emotion, Physiological-Emotion, and Stress
TLDR
This paper utilises the MuSe-CaR dataset focusing on user-generated reviews and introduces the Ulm-TSST dataset, which displays people in stressful depositions, and provides detail on the state-of-the-art feature sets extracted from these datasets for utilisation by the baseline model, a Long Short-Term Memory-Recurrent Neural Network.
The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements
TLDR
This contribution presents MuSe-CaR, a first of its kind multimodal dataset that focused on the tasks of emotion, emotion-target engagement, and trustworthiness recognition by means of comprehensively integrating the audio-visual and language modalities and gives a thorough overview of the dataset in terms of collection and annotation.
Sentiment Analysis and Topic Recognition in Video Transcriptions
TLDR
This article uses SenticNet to extract natural language concepts and fine-tune several feature types on a subset of MuSe-CAR to explore the content of a video as well as learning to predict emotional valence, arousal, and speaker topic classes.
An Estimation of Online Video User Engagement from Features of Continuous Emotions
TLDR
It is demonstrated that smaller boundary ranges and fluctuations for arousal lead to an increase in user engagement, and an effective combination of features is outlined for approaches aiming to automatically predict several user engagement indicators.
MuSe-Toolbox: The Multimodal Sentiment Analysis Continuous Annotation Fusion and Discrete Class Transformation Toolbox
TLDR
This work introduces the MuSe-Toolbox - a Python-based open-source toolkit for creating a variety of continuous and discrete emotion gold standards and proposes the novel Rater Aligned Annotation Weighting (RAAW), which aligns the annotations in a translation-invariant way before weighting and fusing them based on the inter-rater agreements between the annotations.
The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates
TLDR
The Sub-Challenges, baseline feature extraction, and classifiers based on the ‘usual’ COMPARE and BoAW features as well as deep unsupervised representation learning using the AUDEEP toolkit, and deep feature extraction from pre-trained CNNs using the DEEP SPECTRUM toolkit are described.
Temporal Graph Convolutional Network for Multimodal Sentiment Analysis
TLDR
This paper uses positional encoding by interleaving sine and cosine embedding to encode the positions of the segments in the utterances into their features and creates an attention mechanism corresponding to the segments to capture the sentiment-related ones and obtain the unified embeddings of utterances.

References

SHOWING 1-10 OF 11 REFERENCES
A survey of multimodal sentiment analysis
TLDR
The thesis is that multimodal sentiment analysis holds a significant untapped potential with the arrival of complementary data streams for improving and going beyond text-based sentiment analysis.
AVEC 2013: the continuous audio/visual emotion and depression recognition challenge
TLDR
The third Audio-Visual Emotion recognition Challenge (AVEC 2013) has two goals logically organised as sub-challenges: the first is to predict the continuous values of the affective dimensions valence and arousal at each moment in time, and the second is to Predict the value of a single depression indicator for each recording in the dataset.
AVEC 2017: Real-life Depression, and Affect Recognition Workshop and Challenge
TLDR
This paper presents the novelties introduced this year, the challenge guidelines, the data used, and the performance of the baseline system on the two proposed tasks: dimensional emotion recognition (time and value-continuous), and dimensional depression estimation (value-Continuous).
The INTERSPEECH 2020 Computational Paralinguistics Challenge: Elderly Emotion, Breathing & Masks
TLDR
The Sub-Challenges, baseline feature extraction, and classifiers based on the ‘usual’ COMPARE and BoAW features as well as deep unsupervised representation learning using the AUDEEP toolkit, and deep feature extraction from pre-trained CNNs using the DEEP SPECTRUM toolkit are described.
  • 2020
Challenge and Workshop: Multimodal Sentiment Analysis, Emotion-target Engagement and Trustworthiness Detection in Real-life Media
  • MuSe
  • 2020
Integrating Multimodal Information in Large Pretrained Transformers
TLDR
Fine-tuning MAG-BERT and MAG-XLNet significantly boosts the sentiment analysis performance over previous baselines as well as language-only fine- Tuning of BERT and XLNet.
...
1
2
...