Measuring and Improving Consistency in Pretrained Language Models
@article{Elazar2021MeasuringAI, title={Measuring and Improving Consistency in Pretrained Language Models}, author={Yanai Elazar and Nora Kassner and Shauli Ravfogel and Abhilasha Ravichander and Eduard H. Hovy and Hinrich Sch{\"u}tze and Yoav Goldberg}, journal={Transactions of the Association for Computational Linguistics}, year={2021}, volume={9}, pages={1012-1031} }
Abstract Consistency of a model—that is, the invariance of its behavior under meaning-preserving alternations in its input—is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel🤘, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel🤘, we show that the…
65 Citations
Pre-training Language Models with Deterministic Factual Knowledge
- Computer ScienceArXiv
- 2022
The factual knowledge probing experiments indicate that the continuously pre-trained PLMs achieve better robustness in factual knowledge capturing and trying to learn a deterministic relationship with the proposed methods can also help other knowledge-intensive tasks.
Factual Consistency of Multilingual Pretrained Language Models
- Computer Science, LinguisticsFINDINGS
- 2022
MBERT is as inconsistent as English BERT in English paraphrases, but that both mBERT and XLM-R exhibit a high degree of inconsistency in English and even more so for all the other 45 languages.
Measuring Reliability of Large Language Models through Semantic Consistency
- Computer ScienceArXiv
- 2022
A measure of semantic consistency that allows the comparison of open-ended text outputs is developed that is con-siderably more consistent than traditional metrics embodying lexical consistency, and also correlates with human evaluation of output consistency to a higher degree.
Calibrating Factual Knowledge in Pretrained Language Models
- Computer ScienceArXiv
- 2022
This work proposes a simple and lightweight method to calibrate factual knowledge in PLMs without re-training from scratch, and shows the calibration effectiveness and efficiency.
Beyond Distributional Hypothesis: Let Language Models Learn Meaning-Text Correspondence
- Computer ScienceNAACL-HLT
- 2022
A novel intermediate training task, named meaning-matching, designed to directly learn a meaning-text correspondence, is proposed that enables PLMs to learn lexical semantic information and is found to be a safe intermediate task that guarantees a similar or better performance of downstream tasks.
A Review on Language Models as Knowledge Bases
- Computer ScienceArXiv
- 2022
This paper presents a set of aspects that it is deemed an LM should have to fully act as a KB, and reviews the recent literature with respect to those aspects.
Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View
- Computer ScienceACL
- 2022
This paper investigates the prompt-based probing from a causal view, highlights three critical biases which could induce biased results and conclusions, and proposes to conduct debiasing via causal intervention.
P-Adapters: Robustly Extracting Factual Information from Language Models with Diverse Prompts
- Computer ScienceICLR
- 2022
What makes a P-Adapter successful is investigated and it is concluded that access to the LLM’s embeddings of the original natural language prompt, particularly the subject of the entity pair being asked about, is a significant factor.
Knowledge Neurons in Pretrained Transformers
- Computer ScienceACL
- 2022
This paper examines the fill-in-the-blank cloze task for BERT and proposes a knowledge attribution method to identify the neurons that express the fact, finding that the activation of such knowledge neurons is positively correlated to the expression of their corresponding facts.
Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs
- Computer ScienceArXiv
- 2021
Approaches to detecting when 006 models have beliefs about the world, updating 007 model beliefs, and visualizing beliefs graphi- 008 cally are discussed, which suggest that models possess belief-like quali- 019 ties to only a limited extent, but update meth- 020 ods can both correct incorrect model beliefs and greatly improve their consistency.
References
SHOWING 1-10 OF 83 REFERENCES
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Computer ScienceNAACL
- 2019
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples
- Computer ScienceLREC
- 2018
T-REx, a dataset of large scale alignments between Wikipedia abstracts and Wikidata triples, is presented, which is two orders of magnitude larger than the largest available alignments dataset and covers 2.5 times more predicates.
Language Models as Knowledge Bases?
- Computer ScienceEMNLP
- 2019
An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.
Enriching a Model's Notion of Belief using a Persistent Memory
- Computer ScienceArXiv
- 2021
This work adds a memory component a BeliefBank that records a model’s answers, and two mechanisms that use it to improve consistency among beliefs, and shows that, in a controlled experimental setting, these two mechanisms improve both accuracy and consistency.
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models
- Computer Science, LinguisticsEACL
- 2021
This work translates the established benchmarks TREx and GoogleRE into 53 languages and finds that using mBERT as a knowledge base yields varying performance across languages and pooling predictions across languages improves performance.
Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals
- PsychologyTransactions of the Association for Computational Linguistics
- 2021
The inability to infer behavioral conclusions from probing results is pointed out, and an alternative method that focuses on how the information is being used is offered, rather than on what information is encoded is offered.
On the Systematicity of Probing Contextualized Word Representations: The Case of Hypernymy in BERT
- PsychologySTARSEM
- 2020
The main conclusion is cautionary: even if BERT demonstrates high probing accuracy for a particular competence, it does not necessarily follow that BERT ‘understands’ a concept, and it cannot be expected to systematically generalize across applicable contexts.
Do Neural Language Models Overcome Reporting Bias?
- PsychologyCOLING
- 2020
It is found that while pre-trained language models' generalization capacity allows them to better estimate the plausibility of frequent but unspoken of actions, outcomes, and properties, they also tend to overestimate that of the very rare, amplifying the bias that already exists in their training corpus.
Eliciting Knowledge from Language Models Using Automatically Generated Prompts
- Computer ScienceEMNLP
- 2020
The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fill-in-the-blanks problems…