Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances

@article{Poria2019EmotionRI,
  title={Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances},
  author={Soujanya Poria and Navonil Majumder and Rada Mihalcea and Eduard H. Hovy},
  journal={IEEE Access},
  year={2019},
  volume={7},
  pages={100943-100953}
}
Emotion is intrinsic to humans and consequently, emotion understanding is a key part of human-like artificial intelligence (AI). Emotion recognition in conversation (ERC) is becoming increasingly popular as a new research frontier in natural language processing (NLP) due to its ability to mine opinions from the plethora of publicly available conversational data on platforms such as Facebook, Youtube, Reddit, Twitter, and others. Moreover, it has potential applications in health-care systems (as… 

Emotion Recognition with Conversational Generation Transfer

TLDR
An Emotion Recognition with Conversational Generation Transfer (ERCGT) framework to model the interaction among utterances by transfer learning is proposed and empirical studies illustrate the effectiveness of the proposed framework over several strong baselines on three benchmark emotion classification datasets.

Automatic Speech Recognition and Natural Language Understanding for Emotion Detection in Multi-party Conversations

TLDR
An approach which relies on commercially available products and services, such as Google Speech-to-Text, Mozilla DeepSpeech and the NVIDIA NeMo toolkit to process the audio and applies state-of-the-art NLU approaches for emotion recognition, in order to quickly create a robust, production-ready emotion-from-speech detection system applicable to multi-party conversations.

Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts

TLDR
Results show that the inclusion of emotion shift signal helps the model to outperform existing multimodal models for ERC and hence showing the state-of-the-art performance on MOSEI and IEMOCAP datasets.

Contextual Information and Commonsense Based Prompt for Emotion Recognition in Conversation

TLDR
This work proposes a novel ERC model CISPER with the new paradigm of prompt and language model (LM) tuning, equipped with the prompt blending the contextual information and commonsense related to the interlocutor’s utterances, to achieve ERC more e-ectively.

ArmanEmo: A Persian Dataset for Text-based Emotion Detection

TLDR
This study introduces ArmanEmo, a human-labeled emotion dataset of more than 7000 Persian sentences labeled for seven categories based on Ekman’s six basic emotions, and provides several baseline models for emotion classification focusing on the state-of-the-art transformer-based language models.

Exploiting Unsupervised Data for Emotion Recognition in Conversations

TLDR
This paper proposes the Conversation Completion (ConvCom) task, which attempts to select the correct answer from candidate answers to fill a masked utterance in a conversation, and proposes a basic COntext-Dependent Encoder on the ConvCom task.

Utilizing External Knowledge to Enhance Semantics in Emotion Detection in Conversation

TLDR
KES model is proposed, a new framework that incorporates different elements of external knowledge and conversational semantic role labeling to build upon them to learn interactions between interlocutors participating in a conversation that outperforms the state-of-the-art approaches on most of the tested datasets.
...

References

SHOWING 1-10 OF 47 REFERENCES

EmotionLines: An Emotion Corpus of Multi-Party Conversations

TLDR
EmpirLines is introduced, the first dataset with emotions labeling on all utterances in each dialogue only based on their textual content, and several strong baselines for emotion detection models on EmotionLines are provided.

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations

TLDR
The Multimodal EmotionLines Dataset (MELD), an extension and enhancement of Emotion lines, contains about 13,000 utterances from 1,433 dialogues from the TV-series Friends and shows the importance of contextual and multimodal information for emotion recognition in conversations.

Decision support with text-based emotion recognition: Deep learning for affective computing

TLDR
This work adapts recurrent neural networks from the field of deep learning to affective computing and extends these networks for predicting the score of different affective dimensions, and implements transfer learning for pre-training word embeddings.

Emotion Recognition from Text Based on Automatically Generated Rules

TLDR
This work proposes a framework for emotion classification in English sentences where emotions are treated as generalized concepts extracted from the sentences and significantly outperformed the existing state-of-the art machine learning and rule-based classifiers.

ICON: Interactive Conversational Memory Network for Multimodal Emotion Detection

TLDR
Interactive COnversational memory Network (ICON), a multi-modal emotion detection framework that extracts multimodal features from conversational videos and hierarchically models the self- and inter-speaker emotional influences into global memories to aid in predicting the emotional orientation of utterance-videos.

EPITA-ADAPT at SemEval-2019 Task 3: Detecting emotions in textual conversations using deep learning models combination

TLDR
This paper presents their submission for SemEval 2019 task ‘EmoContext’, which consists of classifying a given textual dialogue into one of four emotion classes: Angry, Happy, Sad and Others, based on the combination of different deep neural networks techniques.

Emotion Recognition on Twitter: Comparative Study and Training a Unison Model

TLDR
It is shown that recurrent neural networks, especially character-based ones, can improve over bag-of-words and latent semantic indexing models and that the newly proposed training heuristic produces a unison model with performance comparable to that of the three single models.

Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos

TLDR
A deep neural framework is proposed, termed conversational memory network, which leverages contextual information from the conversation history to recognize utterance-level emotions in dyadic conversational videos.