He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics

  title={He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics},
  author={Nan-Jiang Jiang and Marie-Catherine de Marneffe},
  journal={Transactions of the Association for Computational Linguistics},
Abstract We investigate how well BERT performs on predicting factuality in several existing English datasets, encompassing various linguistic constructions. Although BERT obtains a strong performance on most datasets, it does so by exploiting common surface patterns that correlate with certain factuality labels, and it fails on instances where pragmatic reasoning is necessary. Contrary to what the high performance suggests, we are still far from having a robust system for factuality prediction. 

Re-Examining FactBank: Predicting the Author’s Presentation of Factuality

It is argued that f-measure is an important alternative evaluation metric for factuality and new state-of-the-art results for four corpora including FactBank are provided.

Measure More, Question More: Experimental Studies on Transformer-based Language Models and Complement Coercion

Transformer-based language models have shown strong performance on an array of natural language understanding tasks. However, the question of how these models react to implicit meaning has been

Polish Natural Language Inference and Factivity - an Expert-based Dataset and Benchmarks

A new dataset that focuses exclusively on the factivity phenomenon is contributed and BERT-based models consuming only the input sentences show that they capture most of the complexity of NLI/factivity.

Navigating the Grey Area: Expressions of Overconfidence and Uncertainty in Language Models

Despite increasingly fluent, relevant, and coherent language generation, major gaps remain between how humans and machines use language. We argue that a key dimension that is missing from our

Examining Political Rhetoric with Epistemic Stance Detection

Participants in political discourse employ rhetorical strategies—such as hedging, attributions, or denials—to display varying degrees of belief commitments to claims proposed by themselves or others.

End-to-End Event Factuality Identification with Cross-Lingual Information

An end-to-end joint model JESF is proposed, which uses Bert to encode sentences and uses lingual feature to enrich the semantic representation of sentences, and then use BiLSTM to capture the serialized semantic features of sentences.

(QA)2: Question Answering with Questionable Assumptions

(QA) 2 (Question Answering with Questionable Assumptions), an open-domain evaluation dataset consisting of naturally-occurring search engine queries that may or may not contain questionable assumptions, is proposed.

Crude Oil-related Events Extraction and Processing: A Transfer Learning Approach

A complete framework for extracting and processing crude oil-related events found in CrudeOilNews corpus is presented, addressing the issue of annotation scarcity and class imbalance by leveraging on the e-ectiveness of transfer learning.



Event Detection and Factuality Assessment with Non-Expert Supervision

It is found that non-experts, with very little training, can reliably provide judgments about what events are mentioned and the extent to which the author thinks they actually happened.

Neural Models of Factuality

A substantial expansion of the It Happened portion of the Universal Decompositional Semantics dataset is presented, yielding the largest event factuality dataset to date.

Lexicosyntactic Inference in Neural Models

This work builds a factuality judgment dataset for all English clause-embedding verbs in various syntactic contexts and probes the behavior of current state-of-the-art neural systems, showing that these systems make certain systematic errors that are clearly visible through the lens of factuality prediction.

How well do NLI models capture verb veridicality?

It is shown that, encouragingly, BERT’s inferences are sensitive not only to the presence of individual verb types, but also to the syntactic role of the verb, the form of the complement clause (to- vs. that-complements), and negation.

Integrating Deep Linguistic Features in Factuality Prediction over Unified Datasets

This work proposes an intuitive method for mapping three previously annotated corpora onto a single factuality scale, thereby enabling models to be tested across these corpora and designs a novel model for factuality prediction by first extending a previous rule-based factuality Prediction system and applying it over an abstraction of dependency trees.

Are Natural Language Inference Models IMPPRESsive? Learning IMPlicature and PRESupposition

It is found that BERT learns to draw pragmatic inferences, and NLI training encourages models to learn some, but not all, pragmaticinferences.

Evaluating BERT for natural language inference: A case study on the CommitmentBank

Analysis of model behavior shows that the BERT models still do not capture the full complexity of pragmatic reasoning, nor encode some of the linguistic generalizations, highlighting room for improvement.

FactBank: a corpus annotated with event factuality

FactBank is a corpus annotated with information concerning the factuality of events that has been carried out from a descriptive framework of factuality grounded on both theoretical findings and data analysis.

Harnessing the linguistic signal to predict scalar inferences

This work shows that an LSTM-based sentence encoder trained on an English dataset of human inference strength ratings is able to predict ratings with high accuracy, and probes the model’s behavior using manually constructed minimal sentence pairs and corpus data.

What projects and why

Projection is widely used as a diagnostic for presupposition, but many expression types yield projection even though they do not have standard properties of presupposition, for example appositives,