• Corpus ID: 247058367

MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset

@article{Nielsen2022MuMiNAL,
  title={MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset},
  author={Dan Saattrup Nielsen and Ryan McConville},
  journal={ArXiv},
  year={2022},
  volume={abs/2202.11684}
}
Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolingual, include a limited amount of modalities and are not of sufficient scale and quality… 
1 Citations
Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities
TLDR
This work aims to analyze, categorize and identify existing approaches in addition to challenges and shortcomings they face in order to unearth new opportunities in furthering the research in the field of multi-modal misinformation detection.

References

SHOWING 1-10 OF 48 REFERENCES
Multimodal Fusion with Recurrent Neural Networks for Rumor Detection on Microblogs
TLDR
A novel Recurrent Neural Network with an attention mechanism (att-RNN) to fuse multimodal features for effective rumor detection and the results demonstrate the effectiveness of the proposed end-to-end att- RNN in detecting rumors with multi-modal contents.
KI2TE: Knowledge-Infused InterpreTable Embeddings for COVID-19 Misinformation Detection
TLDR
This work proposes a preliminary novel method to identify fake articles and claims by using information from the CORD-19 academic paper dataset, which uses the similarity between articles and reference manuscripts in a shared embedding space to classify the articles.
"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection
TLDR
This paper presents liar: a new, publicly available dataset for fake news detection, and designs a novel, hybrid convolutional neural network to integrate meta-data with text to improve a text-only deep learning model.
FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media
TLDR
A fake news data repository FakeNewsNet is presented, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information, and is discussed for potential applications on fake news study on social media.
MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims
TLDR
An in-depth analysis of the largest publicly available dataset of naturally occurring factual claims, collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists is presented.
X-FACT: A New Benchmark Dataset for Multilingual Fact Checking
TLDR
The largest publicly available multilingual dataset for factual verification of naturally existing real-world claims is introduced and several automated fact-checking models are developed that make use of additional metadata and evidence from news stories retrieved using a search engine.
FEVER: a Large-scale Dataset for Fact Extraction and VERification
TLDR
This paper introduces a new publicly available dataset for verification against textual sources, FEVER, which consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from.
MM-COVID: A Multilingual and Multidimensional Data Repository for CombatingCOVID-19 Fake New
TLDR
A new fake news detection dataset MM-COVID(Multilingual and Multidimensional COVID-19 Fake News Data Repository) is proposed that provides the multilingual fake news and the relevant social context and is demonstrated to have utility in several potential applications of CO VID-19 fake news study on multilingual and social media.
Inductive Representation Learning on Large Graphs
TLDR
GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
Automatic Detection of Fake News
TLDR
This paper introduces two novel datasets for the task of fake news detection, covering seven different news domains, and conducts a set of learning experiments to build accurate fake news detectors that can achieve accuracies of up to 76%.
...
...