MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations

@inproceedings{Poria2019MELDAM,
  title={MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations},
  author={Soujanya Poria and Devamanyu Hazarika and Navonil Majumder and Gautam Naik and E. Cambria and Rada Mihalcea},
  booktitle={ACL},
  year={2019}
}
Emotion recognition in conversations is a challenging task that has recently gained popularity due to its potential applications. Until now, however, a large-scale multimodal multi-party emotional conversational database containing more than two speakers per dialogue was missing. Thus, we propose the Multimodal EmotionLines Dataset (MELD), an extension and enhancement of EmotionLines. MELD contains about 13,000 utterances from 1,433 dialogues from the TV-series Friends. Each utterance is… 
MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation
TLDR
A new model based on multimodal fused graph convolutional network, MMGCN, is proposed, which can not only make use of multimmodal dependencies effectively, but also leverage speaker information to model inter-spe speaker and intra-speaker dependency.
MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations
TLDR
A large-scale balanced Multimodal Multi-label Emotion, Intensity, and Sentiment Dialogue dataset (MEISD), collected from different TV series that has textual, audio and visual features, is presented and a baseline setup is established for further research.
CTNet: Conversational Transformer Network for Emotion Recognition
  • Zheng Lian, B. Liu, J. Tao
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2021
TLDR
This work proposes a multimodal learning framework for conversational emotion recognition, called conversational transformer network (CTNet), which proposes to use the transformer-based structure to model intra-modal and cross- modal interactions among multimmodal features.
Towards Emotion-aided Multi-modal Dialogue Act Classification
TLDR
It is shown empirically that multi-modality and multi-tasking achieve better performance of DAC compared to uni-modal and single task DAC variants, and builds an attention based multi- modal, multi-task Deep Neural Network for joint learning of DAs and emotions.
AdCOFE: Advanced Contextual Feature Extraction in conversations for emotion classification
TLDR
Experiments on emotion recognition in conversations datasets show that AdCOFE is beneficial in capturing emotions in conversations.
CoMPM: Context Modeling with Speaker's Pre-trained Memory Tracking for Emotion Recognition in Conversation
TLDR
CoMPM is introduced, a context embedding module (CoM) combined with a pre-trained memory module (PM) that tracks memory of the speaker’s previous utterances within the context, and it is shown that the pretrained memory significantly improves the final accuracy of emotion recognition.
Multi-Task Learning with Auxiliary Speaker Identification for Conversational Emotion Recognition
TLDR
This paper exploits speaker identification (SI) as an auxiliary task to enhance the utterance representation in conversations, and can learn better speaker-aware contextual representations from the additional SI corpus.
Hierarchical Multimodal Transformer with Localness and Speaker Aware Attention for Emotion Recognition in Conversations
TLDR
A Hierarchical Multimodal Transformer is proposed as a base model for emotion recognition in Conversations, followed by carefully designing a localness-aware attention mechanism and a speaker-aware Attention mechanism to respectively capture the impact of the local context and the emotional inertia.
Automatic Speech Recognition and Natural Language Understanding for Emotion Detection in Multi-party Conversations
TLDR
An approach which relies on commercially available products and services, such as Google Speech-to-Text, Mozilla DeepSpeech and the NVIDIA NeMo toolkit to process the audio and applies state-of-the-art NLU approaches for emotion recognition, in order to quickly create a robust, production-ready emotion-from-speech detection system applicable to multi-party conversations.
Contextualized Emotion Recognition in Conversation as Sequence Tagging
TLDR
A method to model ERC task as sequence tagging where a Conditional Random Field layer is leveraged to learn the emotional consistency in the conversation and outperforms the current state-of-the-art methods on multiple emotion classification datasets.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
EmotionLines: An Emotion Corpus of Multi-Party Conversations
TLDR
EmpirLines is introduced, the first dataset with emotions labeling on all utterances in each dialogue only based on their textual content, and several strong baselines for emotion detection models on EmotionLines are provided.
ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection
TLDR
Interactive COnversational memory Network (ICON), a multi-modal emotion detection framework that extracts multimodal features from conversational videos and hierarchically models the self- and inter-speaker emotional influences into global memories to aid in predicting the emotional orientation of utterance-videos.
Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos
TLDR
A deep neural framework is proposed, termed conversational memory network, which leverages contextual information from the conversation history to recognize utterance-level emotions in dyadic conversational videos.
Utterance-Level Multimodal Sentiment Analysis
TLDR
It is shown that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.
DialogueRNN: An Attentive RNN for Emotion Detection in Conversations
TLDR
A new method based on recurrent neural networks that keeps track of the individual party states throughout the conversation and uses this information for emotion classification and outperforms the state of the art by a significant margin on two different datasets.
Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances
TLDR
These challenges in ERC are discussed, the drawbacks of these approaches are described, and the reasons why they fail to successfully overcome the research challenges are discussed.
Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset
TLDR
This work proposes a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations, and presents empirical comparisons of dialogue model adaptations forEmpathetic responding, leveraging existing models or datasets without requiring lengthy re-training of the full model.
Tensor Fusion Network for Multimodal Sentiment Analysis
TLDR
A novel model, termed Tensor Fusion Networks, is introduced, which learns intra-modality and inter- modality dynamics end-to-end in sentiment analysis and outperforms state-of-the-art approaches for both multimodal and unimodal sentiment analysis.
AVEC 2012: the continuous audio/visual emotion challenge
TLDR
The challenge guidelines, the common data used, and the performance of the baseline system on the two tasks are presented.
Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages
TLDR
This article addresses the fundamental question of exploiting the dynamics between visual gestures and verbal messages to be able to better model sentiment by introducing the first multimodal dataset with opinion-level sentiment intensity annotations and proposing a new computational representation, called multi-modal dictionary, based on a language-gesture study.
...
1
2
3
4
...