Measuring and Improving Consistency in Pretrained Language Models

  title={Measuring and Improving Consistency in Pretrained Language Models},
  author={Yanai Elazar and Nora Kassner and Shauli Ravfogel and Abhilasha Ravichander and Eduard H. Hovy and Hinrich Sch{\"u}tze and Yoav Goldberg},
  journal={Transactions of the Association for Computational Linguistics},
Abstract Consistency of a model—that is, the invariance of its behavior under meaning-preserving alternations in its input—is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel🤘, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel🤘, we show that the… 

Pre-training Language Models with Deterministic Factual Knowledge

The factual knowledge probing experiments indicate that the continuously pre-trained PLMs achieve better robustness in factual knowledge capturing and trying to learn a deterministic relationship with the proposed methods can also help other knowledge-intensive tasks.

Factual Consistency of Multilingual Pretrained Language Models

MBERT is as inconsistent as English BERT in English paraphrases, but that both mBERT and XLM-R exhibit a high degree of inconsistency in English and even more so for all the other 45 languages.

Measuring Reliability of Large Language Models through Semantic Consistency

A measure of semantic consistency that allows the comparison of open-ended text outputs is developed that is con-siderably more consistent than traditional metrics embodying lexical consistency, and also correlates with human evaluation of output consistency to a higher degree.

Calibrating Factual Knowledge in Pretrained Language Models

This work proposes a simple and lightweight method to calibrate factual knowledge in PLMs without re-training from scratch, and shows the calibration effectiveness and efficiency.

Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text Correspondence

A novel intermediate training task, named meaning-matching, designed to directly learn a meaning-text correspondence, is proposed that enables PLMs to learn lexical semantic information and is found to be a safe intermediate task that guarantees a similar or better performance of downstream tasks.

A Review on Language Models as Knowledge Bases

This paper presents a set of aspects that it is deemed an LM should have to fully act as a KB, and reviews the recent literature with respect to those aspects.

Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View

This paper investigates the prompt-based probing from a causal view, highlights three critical biases which could induce biased results and conclusions, and proposes to conduct debiasing via causal intervention.

P-Adapters: Robustly Extracting Factual Information from Language Models with Diverse Prompts

What makes a P-Adapter successful is investigated and it is concluded that access to the LLM’s embeddings of the original natural language prompt, particularly the subject of the entity pair being asked about, is a significant factor.

Knowledge Neurons in Pretrained Transformers

This paper examines the fill-in-the-blank cloze task for BERT and proposes a knowledge attribution method to identify the neurons that express the fact, finding that the activation of such knowledge neurons is positively correlated to the expression of their corresponding facts.

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

Approaches to detecting when 006 models have beliefs about the world, updating 007 model beliefs, and visualizing beliefs graphi- 008 cally are discussed, which suggest that models possess belief-like quali- 019 ties to only a limited extent, but update meth- 020 ods can both correct incorrect model beliefs and greatly improve their consistency.



BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

of the Association for Computational Linguistics:

T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples

T-REx, a dataset of large scale alignments between Wikipedia abstracts and Wikidata triples, is presented, which is two orders of magnitude larger than the largest available alignments dataset and covers 2.5 times more predicates.

Language Models as Knowledge Bases?

An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.

Enriching a Model's Notion of Belief using a Persistent Memory

This work adds a memory component a BeliefBank that records a model’s answers, and two mechanisms that use it to improve consistency among beliefs, and shows that, in a controlled experimental setting, these two mechanisms improve both accuracy and consistency.

Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models

This work translates the established benchmarks TREx and GoogleRE into 53 languages and finds that using mBERT as a knowledge base yields varying performance across languages and pooling predictions across languages improves performance.

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

The inability to infer behavioral conclusions from probing results is pointed out, and an alternative method that focuses on how the information is being used is offered, rather than on what information is encoded is offered.

On the Systematicity of Probing Contextualized Word Representations: The Case of Hypernymy in BERT

The main conclusion is cautionary: even if BERT demonstrates high probing accuracy for a particular competence, it does not necessarily follow that BERT ‘understands’ a concept, and it cannot be expected to systematically generalize across applicable contexts.

Do Neural Language Models Overcome Reporting Bias?

It is found that while pre-trained language models' generalization capacity allows them to better estimate the plausibility of frequent but unspoken of actions, outcomes, and properties, they also tend to overestimate that of the very rare, amplifying the bias that already exists in their training corpus.

Eliciting Knowledge from Language Models Using Automatically Generated Prompts

The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fill-in-the-blanks problems