A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking

  title={A Richly Annotated Corpus for Different Tasks in Automated Fact-Checking},
  author={Andreas Hanselowski and Christian Stab and Claudia Schulz and Zile Li and Iryna Gurevych},
Automated fact-checking based on machine learning is a promising approach to identify false information distributed on the web. In order to achieve satisfactory performance, machine learning methods require a large corpus with reliable annotations for the different tasks in the fact-checking process. Having analyzed existing fact-checking corpora, we found that none of them meets these criteria in full. They are either too small in size, do not provide detailed annotations, or are limited to a… 

Figures and Tables from this paper

Automatic Fact-Checking with Document-level Annotations using BERT and Multiple Instance Learning
This paper tackles the natural language inference (NLI) subtask—given a document and a (sentence) claim, determine whether the document supports or refutes the claim—only using document-level annotations, significantly outperforming the existing results on the WikiFactCheck-English dataset.
Evidence-based Verification for Real World Information Needs
A novel claim verification dataset with instances derived from search-engine queries, yielding 10,987 claims annotated with evidence that represent real-world information needs that enables systems to use evidence extraction to summarize a rationale for an end-user while maintaining the accuracy when predicting a claim's veracity.
FaVIQ: FAct Verification from Information-seeking Questions
This paper constructs a large-scale challenging fact verification dataset called FAVIQ, consisting of 188k claims derived from an existing corpus of ambiguous information-seeking questions, verified to be natural, contain little lexical bias, and require a complete understanding of the evidence for verification.
A Survey on Automated Fact-Checking
This paper surveys automated fact-checking stemming from natural language processing, and presents an overview of existing datasets and models, aiming to unify the various definitions given and identify common concepts.
The Case for Claim Difficulty Assessment in Automatic Fact Checking
It is argued that prediction of claim difficulty is a missing component of today's automated fact checking architectures, and it is described how this difficulty prediction task might be split into a set of distinct subtasks.
Automated Fact-Checking: A Survey
This paper reviews relevant research on automated fact-checking covering both the claim detection and claim validation components.
FEVEROUS: Fact Extraction and VERification Over Unstructured and Structured information
This paper introduces a novel dataset and benchmark, Fact Extraction and VERification Over Unstructured and Structured information (FEVEROUS), which consists of 87,026 verified claims and develops a baseline for verifying claims against text and tables which predicts both the correct evidence and verdict for 18% of the claims.
CHEF: A Pilot Chinese Dataset for Evidence-Based Fact-Checking
CHEF, the first CHinese Evidence-based Fact-checking dataset of 10K real-world claims, is constructed and developed, able to model the evidence retrieval as a latent variable, allowing jointly training with the veracity prediction model in an end-to-end fashion.
Generating Fact Checking Briefs
This work investigates how to increase the accuracy and efficiency of fact checking by providing information about the claim before performing the check, in the form of natural language briefs, and develops QABriefer, a model that generates a set of questions conditioned on the claim, searches the web for evidence, and generates answers.


FEVER: a Large-scale Dataset for Fact Extraction and VERification
This paper introduces a new publicly available dataset for verification against textual sources, FEVER, which consists of 185,445 claims generated by altering sentences extracted from Wikipedia and subsequently verified without knowledge of the sentence they were derived from.
A large annotated corpus for learning natural language inference
The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.
Fact Checking: Task definition and dataset construction
The task of fact checking is introduced and the construction of a publicly available dataset using statements fact-checked by journalists available online is detailed, including baseline approaches for the task and the challenges that need to be addressed.
UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification
This paper presents the claim verification pipeline approach, which, according to the preliminary results, scored third in the shared task, out of 23 competing systems, and introduces two extensions to the Enhanced LSTM (ESIM).
Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources
This work uses a corpus of heterogeneous documents to address the issue that information seekers usually face – a variety of different types of information sources and finds that the manually created corpus is of high quality and has the potential to bridge the gap between reference corpora of abstracts and automatic methods producing extracts.
Automatic Summarization of Open-Domain Multiparty Dialogues in Diverse Genres
  • K. Zechner
  • Computer Science
    Computational Linguistics
  • 2002
The task and the challenges involved and motivates are introduced and an approach for obtaining automatic-extract summaries for human transcripts of multiparty dialogues of four different genres is presented, without any restriction on domain.
The Argument Reasoning Comprehension Task
This article defines a new task, argument reasoning comprehension, and proposes a complex, yet scalable crowdsourcing process, and creates a new freely licensed dataset based on authentic arguments from news comments, revealing that current methods lack the capability to solve the task.
Survey Article: Inter-Coder Agreement for Computational Linguistics
It is argued that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in computational linguistics, may be more appropriate for many corpus annotation tasks—but that their use makes the interpretation of the value of the coefficient even harder.
"Liar, Liar Pants on Fire": A New Benchmark Dataset for Fake News Detection
This paper presents liar: a new, publicly available dataset for fake news detection, and designs a novel, hybrid convolutional neural network to integrate meta-data with text to improve a text-only deep learning model.
Beyond Generic Summarization: A Multi-faceted Hierarchical Summarization Corpus of Large Heterogeneous Data
A new approach for creating hierarchical summarization corpora is presented by first, extracting relevant content from large, heterogeneous document collections using crowdsourcing and second, ordering the relevant information hierarchically by trained annotators.