Multilogue-Net: A Context-Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation

  title={Multilogue-Net: A Context-Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation},
  author={Aman Shenoy and Ashish Sardana},
Sentiment Analysis and Emotion Detection in conversation is key in several real-world applications, with an increase in modalities available aiding a better understanding of the underlying emotions. Multi-modal Emotion Detection and Sentiment Analysis can be particularly useful, as applications will be able to use specific subsets of available modalities, as per the available data. Current systems dealing with Multi-modal functionality fail to leverage and capture - the context of the… 

Figures and Tables from this paper

A Transformer-based joint-encoding for Emotion Recognition and Sentiment Analysis

A Transformer-based joint-encoding (TBJE) for the task of Emotion Recognition and Sentiment Analysis that relies on a modular co-attention and a glimpse layer to jointly encode one or more modalities.

Sentiment analysis in textual, visual and multimodal inputs using recurrent neural networks

An extensive review over the applicability, challenges, issues, and approaches over the role of sequential deep neural networks in sentiment analysis of multimodal data using RNN and its architectural variants is presented.

Modulated Fusion using Transformer for Linguistic-Acoustic Emotion Recognition

This paper proposes two architectures based on Transformers and modulation that combine the linguistic and acoustic inputs from a wide range of datasets to challenge, and sometimes surpass, the state-of-the-art in the field of Emotion Recognition and Sentiment Analysis.

Detecting Hate Speech in Multi-modal Memes

This paper tries to solve the Facebook Meme Challenge, which aims to solve a binary classification problem of predicting whether a meme is hateful or not, and tackles the benign text confounders present in the dataset to improve the performance.

Analyzing Unaligned Multimodal Sequence via Graph Convolution and Graph Pooling Fusion

A novel model, termed Multimodal Graph, is proposed to investigate the effectiveness of graph neural networks (GNN) on modeling multimodal sequential data and devise a graph pooling fusion network to automatically learn the associations between various nodes from different modalities.

Unimodal and Crossmodal Refinement Network for Multimodal Sequence Fusion

Experimental results on MOSI and MOSEI datasets illustrated that the proposed UCRN outperforms recent state-of-the-art techniques and its robustness is highly preferred in real multimodal sequence fusion scenarios.

A Survey of Sentiment Analysis Based on Deep Learning

This paper surveyes various sentiment analysis methods based on convolutional neural network, recurrent Neural Network, long short-term memory, deep neuralnetwork, deep belief network, and memory network and point out the main problems of these methods.

COGMEN: COntextualized GNN based Multimodal Emotion recognitioN

The proposed model uses Graph Neural Network (GNN) based architecture to model the complex dependencies in a conversation and gives state-of-the- art (SOTA) results on IEMOCAP and MOSEI datasets, and detailed ablation experiments show the importance of modeling information at both levels.

Multimodal Sentiment Analysis Based on Interactive Transformer and Soft Mapping

An Interactive Transformer and Soft Mapping based method that can fully consider the relationship between multiple modal pieces of information and provides a new solution to the problem of data interaction in multimodal sentiment analysis is proposed.

Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts

Results show that the inclusion of emotion shift signal helps the model to outperform existing multimodal models for ERC and hence showing the state-of-the-art performance on MOSEI and IEMOCAP datasets.



Contextual Inter-modal Attention for Multi-modal Sentiment Analysis

A recurrent neural network based multi-modal attention framework that leverages the contextual information for utterance-level sentiment prediction that applies attention on multi- modal multi-utterance representations and tries to learn the contributing features amongst them.

Tensor Fusion Network for Multimodal Sentiment Analysis

A novel model, termed Tensor Fusion Networks, is introduced, which learns intra-modality and inter- modality dynamics end-to-end in sentiment analysis and outperforms state-of-the-art approaches for both multimodal and unimodal sentiment analysis.

Multimodal sentiment analysis with word-level fusion and reinforcement learning

The Gated Multimodal Embedding LSTM with Temporal Attention model is proposed that is composed of 2 modules and able to perform modality fusion at the word level and is able to better model the multimodal structure of speech through time and perform better sentiment comprehension.

Context-Dependent Sentiment Analysis in User-Generated Videos

A LSTM-based model is proposed that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process and showing 5-10% performance improvement over the state of the art and high robustness to generalizability.

DialogueRNN: An Attentive RNN for Emotion Detection in Conversations

A new method based on recurrent neural networks that keeps track of the individual party states throughout the conversation and uses this information for emotion classification and outperforms the state of the art by a significant margin on two different datasets.

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

This paper introduces CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI), the largest dataset of sentiment analysis and emotion recognition to date and uses a novel multimodal fusion technique called the Dynamic Fusion Graph (DFG), which is highly interpretable and achieves competative performance when compared to the previous state of the art.

Conversational Memory Network for Emotion Recognition in Dyadic Dialogue Videos

A deep neural framework is proposed, termed conversational memory network, which leverages contextual information from the conversation history to recognize utterance-level emotions in dyadic conversational videos.

Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis

A novel method to extract features from visual and textual modalities using deep convolutional neural networks and significantly outperform the state of the art of multimodal emotion recognition and sentiment analysis on different datasets is presented.

Emotion Recognition in Conversation: Research Challenges, Datasets, and Recent Advances

These challenges in ERC are discussed, the drawbacks of these approaches are described, and the reasons why they fail to successfully overcome the research challenges are discussed.

MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos

This paper introduces to the scientific community the first opinion-level annotated corpus of sentiment and subjectivity analysis in online videos called Multimodal Opinion-level Sentiment Intensity dataset (MOSI), which is rigorously annotated with labels for subjectivity, sentiment intensity, per-frame and per-opinion annotated visual features, andper-milliseconds annotated audio features.