MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets

  title={MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets},
  author={Shraman Pramanick and Shivam Sharma and Dimitar I. Dimitrov and Md. Shad Akhtar and Preslav Nakov and Tanmoy Chakraborty},
Internet memes have become powerful means to transmit political, psychological, and sociocultural ideas. Although memes are typically humorous, recent days have witnessed an escalation of harmful memes used for trolling, cyberbullying, and abusing social entities. Detecting such harmful memes is challenging as they can be highly satirical and cryptic. Moreover, while previous work has focused on specific aspects of memes such as hate speech and propaganda, there has been little work on harm in… 
Nipping in the Bud: Detection, Diffusion and Mitigation of Hate Speech on Social Media
This article presents methodological challenges that hinder building automated hate mitigation systems and discusses a series of proposed solutions to limit the spread of hate speech on social media.


Detecting Harmful Memes and Their Targets
This work presents HarMeme, the first benchmark dataset, containing 3,544 memes related to COVID-19, and proposes two novel problem formulations: detecting harmful memes and the social entities that these harmful memes target.
Multimodal Learning for Hateful Memes Detection
  • Yi Zhou, Zhenhao Chen
  • Computer Science
    2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)
  • 2021
This paper proposes a novel method that incorporates the image captioning process into the memes detection process and achieves promising results on the Hateful Memes Detection Challenge.
Multimodal Meme Dataset (MultiOFF) for Identifying Offensive Content in Image and Text
An early fusion technique is used to combine the image and text modality and compare it with a text- and an image-only baseline to investigate its effectiveness, and results show improvements in terms of Precision, Recall, and F-Score.
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
This work addresses the challenge of hate speech detection in Internet memes, and attempts using visual information to automatically detect hate speech, unlike any previous work of our knowledge.
SemEval-2020 Task 8: Memotion Analysis- the Visuo-Lingual Metaphor!
The objective of this proposal is to bring the attention of the research community towards the automatic processing of Internet memes by classifying intensity of meme emotion.
A Multimodal Framework for the Detection of Hateful Memes
This work improves the performance of existing multimodal approaches beyond simple fine-tuning and shows the effectiveness of upsampling of contrastive examples to encourage multimodality and ensemble learning based on cross-validation to improve robustness.
Detecting Medical Misinformation on Social Media Using Multimodal Deep Learning
A new semantic- and task-level attention mechanism was created to help the model to focus on the essential contents of a post that signal antivaccine messages, and the final network achieves above 97% testing accuracy and outperforms other relevant models, demonstrating that it can detect a large amount of antivaccin messages posted daily.
Findings of the Shared Task on Troll Meme Classification in Tamil
The internet has facilitated its user-base with a platform to communicate and express their views without any censorship. On the other hand, this freedom of expression or free speech can be abused by
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge
This work utilizes VisualBERT – which meant to be the “BERT of vision and language” – that was trained multimodally on images and captions and applies Ensemble Learning to detect hate speech in multimodal memes.
Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model
This work treats text features, image features and image attributes as three modalities and proposes a multi-modal hierarchical fusion model to address sarcasm detection for tweets consisting of texts and images in Twitter.