CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking

@inproceedings{Hu2022CHEFAP,
  title={CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking},
  author={Xuming Hu and Zhijiang Guo and Guan-Huei Wu and Aiwei Liu and Lijie Wen and Philip S. Yu},
  booktitle={NAACL},
  year={2022}
}
The explosion of misinformation spreading in the media ecosystem urges for automated fact-checking. While misinformation spans both geographic and linguistic boundaries, most work in the field has focused on English. Datasets and tools available in other languages, such as Chinese, are limited. In order to bridge this gap, we construct CHEF, the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims. The dataset covers multiple domains, ranging from politics to public… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 45 REFERENCES

MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims

TLDR
An in-depth analysis of the largest publicly available dataset of naturally occurring factual claims, collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists is presented.

X-Fact: A New Benchmark Dataset for Multilingual Fact Checking

TLDR
The largest publicly available multilingual dataset for factual verification of naturally existing real-world claims is introduced and several automated fact-checking models are developed that make use of additional metadata and evidence from news stories retrieved using a search engine.

FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information

TLDR
This paper introduces a novel dataset and benchmark, Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS), which consists of 87,026 verified claims and develops a baseline for verifying claims against text and tables which predicts both the correct evidence and verdict for 18% of the claims.

TabFact: A Large-scale Dataset for Table-based Fact Verification

TLDR
A large-scale dataset with 16k Wikipedia tables as the evidence for 118k human-annotated natural language statements, which are labeled as either ENTAILED or REFUTED is constructed and two different models are designed: Table-BERT and Latent Program Algorithm (LPA).

FEVER: a Large-scale Dataset for Fact Extraction and VERification

TLDR
This paper introduces a new publicly available dataset for verification against textual sources, FEVER, which consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from.

Explainable Automated Fact-Checking for Public Health Claims

TLDR
The results indicate that, by training on in-domain data, gains can be made in explainable, automated fact-checking for claims which require specific expertise.

A Survey on Automated Fact-Checking

TLDR
This paper surveys automated fact-checking stemming from natural language processing, and presents an overview of existing datasets and models, aiming to unify the various definitions given and identify common concepts.

HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification

TLDR
It is shown that the performance of an existing state-of-the-art semantic-matching model degrades significantly on this dataset as the number of reasoning hops increases, hence demonstrating the necessity of many-hop reasoning to achieve strong results.

GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification

TLDR
A graph-based evidence aggregating and reasoning (GEAR) framework which enables information to transfer on a fully-connected evidence graph and then utilizes different aggregators to collect multi-evidence information is proposed.

FakeCovid - A Multilingual Cross-domain Fact Check News Dataset for COVID-19

TLDR
This paper has built a classifier to detect fake news and present results for the automatic fake news detection and its class, and manually annotated articles into 11 different categories of the fact-checked news according to their content.