MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations

  title={MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations},
  author={Soujanya Poria and Devamanyu Hazarika and Navonil Majumder and Gautam Naik and E. Cambria and Rada Mihalcea},
Emotion recognition in conversations is a challenging task that has recently gained popularity due to its potential applications. Until now, however, a large-scale multimodal multi-party emotional conversational database containing more than two speakers per dialogue was missing. Thus, we propose the Multimodal EmotionLines Dataset (MELD), an extension and enhancement of EmotionLines. MELD contains about 13,000 utterances from 1,433 dialogues from the TV-series Friends. Each utterance is… 

M-MELD: A Multilingual Multi-Party Dataset for Emotion Recognition in Conversations

This paper extends the Multimodal EmotionLines Dataset (MELD) to 4 other languages beyond English, namely Greek, Polish, French, and Spanish, and proposes a novel architecture, DiscLSTM, that uses both sequential and conversational discourse context in a conversational dialogue for ERC.

MMGCN: Multimodal Fusion via Deep Graph Convolution Network for Emotion Recognition in Conversation

A new model based on multimodal fused graph convolutional network, MMGCN, is proposed, which can not only make use of multimmodal dependencies effectively, but also leverage speaker information to model inter-spe speaker and intra-speaker dependency.

MEISD: A Multimodal Multi-Label Emotion, Intensity and Sentiment Dialogue Dataset for Emotion Recognition and Sentiment Analysis in Conversations

A large-scale balanced Multimodal Multi-label Emotion, Intensity, and Sentiment Dialogue dataset (MEISD), collected from different TV series that has textual, audio and visual features, is presented and a baseline setup is established for further research.

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database

A general Multimodal Dialogue-aware Interaction framework, MDI, is proposed to model the dialogue context for emotion recognition, which achieves comparable performance to the state-of-the-art methods on the M^3ED.

M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation

A Multi-modal Fusion Network (M2FNet) that extracts emotion-relevant features from visual, audio, and text modality and employs a multi-head attention-based fusion mechanism to combine emotion-rich latent representations of the input data is proposed.

Multimodal Attentive Learning for Real-time Explainable Emotion Recognition in Conversations

A novel contrastive loss-based optimization framework that keeps track of spatio-temporal states of the participants and their conversation dynamics and exhibits promise in identifying the emotion state of the individual speaker in real-time and can identify top-k words in the conversation that influence emotion recognition.

Towards Emotion-aided Multi-modal Dialogue Act Classification

It is shown empirically that multi-modality and multi-tasking achieve better performance of DAC compared to uni-modal and single task DAC variants, and builds an attention based multi- modal, multi-task Deep Neural Network for joint learning of DAs and emotions.

AdCOFE: Advanced Contextual Feature Extraction in conversations for emotion classification

Experiments on emotion recognition in conversations datasets show that AdCOFE is beneficial in capturing emotions in conversations.

Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition

A contextual transformer module to introduce contextual information via embedding the previous utterances between interlocutors, which enhances the emotion representation of the current utterance, and a cross-modal transformer module that focuses on the interactions between text and audio modalities, adaptively promoting the fusion from one modality to another.

Korean Drama Scene Transcript Dataset for Emotion Recognition in Conversations

A context-aware deep learning model is developed to classify emotions using the speaker-level context and scene context and achieved an F1-score of 0.63 on the proposed dataset.



EmotionLines: An Emotion Corpus of Multi-Party Conversations

EmpirLines is introduced, the first dataset with emotions labeling on all utterances in each dialogue only based on their textual content, and several strong baselines for emotion detection models on EmotionLines are provided.

ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection

Interactive COnversational memory Network (ICON), a multi-modal emotion detection framework that extracts multimodal features from conversational videos and hierarchically models the self- and inter-speaker emotional influences into global memories to aid in predicting the emotional orientation of utterance-videos.

Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos

A deep neural framework is proposed, termed conversational memory network, which leverages contextual information from the conversation history to recognize utterance-level emotions in dyadic conversational videos.

Utterance-Level Multimodal Sentiment Analysis

It is shown that multimodal sentiment analysis can be effectively performed, and that the joint use of visual, acoustic, and linguistic modalities can lead to error rate reductions of up to 10.5% as compared to the best performing individual modality.

DialogueRNN: An Attentive RNN for Emotion Detection in Conversations

A new method based on recurrent neural networks that keeps track of the individual party states throughout the conversation and uses this information for emotion classification and outperforms the state of the art by a significant margin on two different datasets.

Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances

These challenges in ERC are discussed, the drawbacks of these approaches are described, and the reasons why they fail to successfully overcome the research challenges are discussed.

Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset

This work proposes a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations, and presents empirical comparisons of dialogue model adaptations forEmpathetic responding, leveraging existing models or datasets without requiring lengthy re-training of the full model.

Tensor Fusion Network for Multimodal Sentiment Analysis

A novel model, termed Tensor Fusion Networks, is introduced, which learns intra-modality and inter- modality dynamics end-to-end in sentiment analysis and outperforms state-of-the-art approaches for both multimodal and unimodal sentiment analysis.

Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages

This article addresses the fundamental question of exploiting the dynamics between visual gestures and verbal messages to be able to better model sentiment by introducing the first multimodal dataset with opinion-level sentiment intensity annotations and proposing a new computational representation, called multi-modal dictionary, based on a language-gesture study.

Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory

This paper proposes Emotional Chatting Machine (ECM), the first work that addresses the emotion factor in large-scale conversation generation using three new mechanisms that respectively models the high-level abstraction of emotion expressions by embedding emotion categories.