CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking

@article{Hu2022CHEFAP,
  title={CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking},
  author={Xuming Hu and Zhijiang Guo and Guan-Huei Wu and Aiwei Liu and Lijie Wen and Philip S. Yu},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.11863}
}
The explosion of misinformation spreading in the media ecosystem urges for automated fact-checking. While misinformation spans both geographic and linguistic boundaries, most work in the field has focused on English. Datasets and tools available in other languages, such as Chinese, are limited. In order to bridge this gap, we construct CHEF, the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims. The dataset covers multiple domains, ranging from politics to public… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 43 REFERENCES

MultiFC: A Real-World Multi-Domain Dataset for Evidence-Based Fact Checking of Claims

An in-depth analysis of the largest publicly available dataset of naturally occurring factual claims, collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists is presented.

X-Fact: A New Benchmark Dataset for Multilingual Fact Checking

The largest publicly available multilingual dataset for factual verification of naturally existing real-world claims is introduced and several automated fact-checking models are developed that make use of additional metadata and evidence from news stories retrieved using a search engine.

FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information

This paper introduces a novel dataset and benchmark, Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS), which consists of 87,026 verified claims and develops a baseline for verifying claims against text and tables which predicts both the correct evidence and verdict for 18% of the claims.

TabFact: A Large-scale Dataset for Table-based Fact Verification

A large-scale dataset with 16k Wikipedia tables as the evidence for 118k human-annotated natural language statements, which are labeled as either ENTAILED or REFUTED is constructed and two different models are designed: Table-BERT and Latent Program Algorithm (LPA).

FEVER: a Large-scale Dataset for Fact Extraction and VERification

This paper introduces a new publicly available dataset for verification against textual sources, FEVER, which consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from.

Explainable Automated Fact-Checking for Public Health Claims

The results indicate that, by training on in-domain data, gains can be made in explainable, automated fact-checking for claims which require specific expertise.

Integrating Stance Detection and Fact Checking in a Unified Corpus

This paper supports the interdependencies between fact checking, document retrieval, source credibility, stance detection and rationale extraction as annotations in the same corpus and implements this setup on an Arabic fact checking corpus, the first of its kind.

A Survey on Automated Fact-Checking

This paper surveys automated fact-checking stemming from natural language processing, and presents an overview of existing datasets and models, aiming to unify the various definitions given and identify common concepts.

HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification

It is shown that the performance of an existing state-of-the-art semantic-matching model degrades significantly on this dataset as the number of reasoning hops increases, hence demonstrating the necessity of many-hop reasoning to achieve strong results.

GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification

A graph-based evidence aggregating and reasoning (GEAR) framework which enables information to transfer on a fully-connected evidence graph and then utilizes different aggregators to collect multi-evidence information is proposed.