• Corpus ID: 243832767

Dataset of Fake News Detection and Fact Verification: A Survey

  title={Dataset of Fake News Detection and Fact Verification: A Survey},
  author={Taichi Murayama},
The rapid increase in fake news, which causes significant damage to society, triggers many fake news related studies, including the development of fake news detection and fact verification techniques. The resources for these studies are mainly available as public datasets taken fromWeb data. We surveyed 118 datasets related to fake news research on a large scale from three perspectives: (1) fake news detection, (2) fact verification, and (3) other tasks; for example, the analysis of fake news… 

Figures and Tables from this paper

Annotation-Scheme Reconstruction for “Fake News” and Japanese Fake News Dataset

This work proposes a novel annotation scheme with fine-grained labeling based on detailed investigations of existing fake news datasets to capture these various aspects of fake news.

FakeSV: A Multimodal Benchmark with Rich Social Context for Fake News Detection on Short Video Platforms

This paper constructs the largest Chinese short video dataset about fake news named FakeSV, which includes news content, user comments, and publisher comments simultaneously simultaneously and provides a new multimodal detection model named SV-FEND, which exploits the cross-modal correlations to select the most informative features and utilizes the social context information for de- tection.

Methods of Informational Trends Analytics and Fake News Detection on Twitter

Information trends caused by Russian invasion of Ukraine in 2022 year have been studied and the possible impact of informational trends on different companies working in Russia during this invasion is considered.

"This is Fake News": Characterizing the Spontaneous Debunking from Twitter Users to COVID-19 False Information

It is found that most fake tweets are left undebunked and spontaneous debunking is slower than other forms of responses, and exhibits partisanship in political topics.

Who Funds Misinformation? A Systematic Analysis of the Ad-related Profit Routines of Fake News sites

Fake news is an age-old phenomenon, widely assumed to be associated with political propaganda published to sway public opinion. Yet, with the growth of social media it has become a lucrative business

CsFEVER and CTKFacts: Acquiring Czech data for fact verification

In this paper, we examine several methods of acquiring Czech data for automated fact-checking, which is a task commonly modeled as a classification of textual claim veracity w.r.t. a corpus of trusted

CsFEVER and CTKFacts: Czech Datasets for Fact Verification

This paper presents two Czech datasets for fact verification for spurious cues, which are annotation patterns leading to model overfitting, and describes a method to automatically generate wider claim contexts (dictionaries) for non-hyperlinked corpora.

Detecting and classifying online health misinformation with ‘Content Similarity Measure (CSM)’ algorithm: an automated fact-checking-based approach

An extensive analysis of the proposed algorithm compared with standard similarity measures and machine learning classifiers showed that the ‘content similarity score’ feature outperformed other features with an accuracy of 88.26%.


  • P. BaglaKuldeep Kumar
  • Computer Science
    International Journal of Software Science and Computational Intelligence
  • 2023
A model named Text Analysis of Web-based Health Information (TA-WHI), based on an algorithm designed for this, categorizes health-related social media feeds into five categories: sufficient, fabricated, meaningful, advertisement, and misleading.

Statistical learning from Brazilian fake news

The results show that four variables are significant to explain fake news and the model achieved comparable results with state‐of‐the‐art, 0.941 F‐measure, for a single classifier while having the advantage of being a parsimonious model.



Fake news detection: a survey of evaluation datasets

This survey systematically review popular datasets for fake news detection by providing insights into the characteristics of each dataset and comparative analysis among them, along with a set of requirements for comparing and building new datasets.

Survey on Fake News Detection Techniques

This survey comprehensively and systematically studies different methodologies in the detection of fake news in digital media and identifies and specifies fundamental theories in Machine Learning to facilitate and enhance the research offake news detection.

A Survey on Natural Language Processing for Fake News Detection

The challenges involved in fake news detection are described and the task formulations, datasets and NLP solutions that have been developed for this task are compared, and the potentials and limitations of them are discussed.

Mitigation of Diachronic Bias in Fake News Detection Dataset

This study confirms the bias, especially proper nouns including person names, from the deviation of phrase appearances in each dataset and proposes masking methods using Wikidata to mitigate the influence of person names and validate whether they make fake news detection models robust through experiments with in-domain and out-of-domain data.

Combating Fake News: A Survey on Identification and Mitigation Techniques

This survey describes the modern-day problem of fake news and, in particular, highlights the technical challenges associated with it and comprehensively compile and summarize characteristic features of available datasets.

Fake News Detection using Temporal Features Extracted via Point Process

This paper proposes a novel multi-modal attention-based method, which includes linguistic and user features alongside temporal features, for detectingfake news from SNS posts by using a point process algorithm to identify fake news from real news.

Automatic Detection of Fake News

This paper introduces two novel datasets for the task of fake news detection, covering seven different news domains, and conducts a set of learning experiments to build accurate fake news detectors that can achieve accuracies of up to 76%.

A survey on fake news and rumour detection techniques

Early Detection of Fake News by Utilizing the Credibility of News, Publishers, and Users based on Weakly Supervised Learning

A novel structure-aware multi-head attention network (SMAN), which combines the news content, publishing, and reposting relations of publishers and users, to jointly optimize the fake news detection and credibility prediction tasks and can detect fake news in 4 hours with over 91%, which is much faster than the state-of-the-art models.

FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media

A fake news data repository FakeNewsNet is presented, which contains two comprehensive data sets with diverse features in news content, social context, and spatiotemporal information, and is discussed for potential applications on fake news study on social media.