MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset
@article{Nielsen2022MuMiNAL, title={MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset}, author={Dan Saattrup Nielsen and Ryan McConville}, journal={ArXiv}, year={2022}, volume={abs/2202.11684} }
Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolingual, include a limited amount of modalities and are not of sufficient scale and quality…
Figures and Tables from this paper
One Citation
Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities
- Computer ScienceArXiv
- 2022
This work aims to analyze, categorize and identify existing approaches in addition to challenges and shortcomings they face in order to unearth new opportunities in furthering the research in the field of multi-modal misinformation detection.
References
SHOWING 1-10 OF 48 REFERENCES
Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs
- Computer ScienceACM Multimedia
- 2017
A novel Recurrent Neural Network with an attention mechanism (att-RNN) to fuse multimodal features for effective rumor detection and the results demonstrate the effectiveness of the proposed end-to-end att- RNN in detecting rumors with multi-modal contents.
KI2TE: Knowledge-Infused InterpreTable Embeddings for COVID-19 Misinformation Detection
- Computer ScienceKnOD@WWW
- 2021
This work proposes a preliminary novel method to identify fake articles and claims by using information from the CORD-19 academic paper dataset, which uses the similarity between articles and reference manuscripts in a shared embedding space to classify the articles.
"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection
- Computer ScienceACL
- 2017
This paper presents liar: a new, publicly available dataset for fake news detection, and designs a novel, hybrid convolutional neural network to integrate meta-data with text to improve a text-only deep learning model.
FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media
- Computer ScienceBig Data
- 2020
A fake news data repository FakeNewsNet is presented, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information, and is discussed for potential applications on fake news study on social media.
MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims
- Computer ScienceEMNLP
- 2019
An in-depth analysis of the largest publicly available dataset of naturally occurring factual claims, collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists is presented.
X-FACT: A New Benchmark Dataset for Multilingual Fact Checking
- Computer ScienceACL/IJCNLP
- 2021
The largest publicly available multilingual dataset for factual verification of naturally existing real-world claims is introduced and several automated fact-checking models are developed that make use of additional metadata and evidence from news stories retrieved using a search engine.
FEVER: a Large-scale Dataset for Fact Extraction and VERification
- Computer ScienceNAACL
- 2018
This paper introduces a new publicly available dataset for verification against textual sources, FEVER, which consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from.
MM-COVID: A Multilingual and Multidimensional Data Repository for CombatingCOVID-19 Fake New
- Computer Science
- 2020
A new fake news detection dataset MM-COVID(Multilingual and Multidimensional COVID-19 Fake News Data Repository) is proposed that provides the multilingual fake news and the relevant social context and is demonstrated to have utility in several potential applications of CO VID-19 fake news study on multilingual and social media.
Inductive Representation Learning on Large Graphs
- Computer ScienceNIPS
- 2017
GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
Automatic Detection of Fake News
- Computer ScienceCOLING
- 2018
This paper introduces two novel datasets for the task of fake news detection, covering seven different news domains, and conducts a set of learning experiments to build accurate fake news detectors that can achieve accuracies of up to 76%.