BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief

  title={BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief},
  author={Nora Kassner and Oyvind Tafjord and Hinrich Schutze and Peter Clark},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
Although pretrained language models (PTLMs) contain significant amounts of world knowledge, they can still produce inconsistent answers to questions when probed, even after specialized training. As a result, it can be hard to identify what the model actually “believes” about the world, making it susceptible to inconsistent behavior and simple errors. Our goal is to reduce these problems. Our approach is to embed a PTLM in a broader system that also includes an evolving, symbolic memory of… 

Figures and Tables from this paper

MemoryBank: A Flexible Framework for Systemic Beliefs

This work introduces MemoryBank, which replaces the explicit constraints with a Natural Language Inference model containing implicit constraints encoded in its weights, and finds that while MemoryBank provides marginal improvement in Question-Answering consistency and accuracy, ultimately the chosen NLI model is not powerful enough to replace the constraint graph in BeliefBank.

Improving Logical Consistency in Pre-Trained Language Models using Natural Language Inference

It is demonstrated that natural language inference (NLI) can provide additional signal about contradictory statements output by a PTLM, and methods for using these NLI probabilities to define a MaxSAT problem that, when optimized, yields corrected predictions.

IntrospectQA: Building Self-reflecting, Consistent Question Answering Models

The preliminary experiments indicate that IntrospectQA can boost performance over a baseline QA model over a wide variety of topics without any pretraining, finetuning, or external constraint graphs, suggesting that leveraging a pretrained NLI model is a potential avenue for improving the logical consistency and accuracy of aQA model.

Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference

This work proposes a framework, Consistency Correction through Relation Detection, or ConCoRD, for boosting the consistency and accuracy of pre-trained NLP models using off-the-shelf natural language inference (NLI) models without re-tuning or re-training.

Towards Teachable Reasoning Systems

Generated chains of reasoning show how answers are implied by the system’s own internal beliefs, and are both faithful and truthful, which suggests new opportunities for using language models in an interactive setting where users can inspect, debug, correct, and improve a system‘s performance over time.

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

Approaches to detecting when 006 models have beliefs about the world, updating 007 model beliefs, and visualizing beliefs graphi- 008 cally are discussed, which suggest that models possess belief-like quali- 019 ties to only a limited extent, but update meth- 020 ods can both correct incorrect model beliefs and greatly improve their consistency.

Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning

The approach is to recursively combine a trained backward-chaining model, capable of generating a set of premises entailing an answer hypothesis, with a verifier that checks that the model itself believes those premises (and the entailment itself) through self-querying.

Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations

This work develops M AIEUTIC PROMPTING, which aims to infer a correct answer to a question even from the unreliable generations of LM, and improves robustness in inference while providing interpretable rationales.

Breakpoint Transformers for Modeling and Tracking Intermediate Beliefs

The feasibility of incorporating the main breakpoint transformer, based on T5, into more complex reasoning pipelines, is obtained and SOTA performance on the three-tiered reasoning challenge for the TRIP benchmark is obtained.

Judgment aggregation, discursive dilemma and reflective equilibrium: Neural language models as self-improving doxastic agents

Neural language models (NLMs) are susceptible to producing inconsistent output. This paper proposes a new diagnosis as well as a novel remedy for NLMs' incoherence. We train NLMs on synthetic text



Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge

This work provides a first demonstration that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements, and demonstrates that models learn to effectively perform inference which involves implicit taxonomic and world knowledge, chaining and counting.

Measuring and Improving Consistency in Pretrained Language Models

The creation of PARAREL, a high-quality resource of cloze-style query English paraphrases, and analysis of the representational spaces of PLMs suggest that they have a poor structure and are currently not suitable for representing knowledge in a robust way.

Transformers as Soft Reasoners over Language

This work trains transformers to reason (or emulate reasoning) over natural language sentences using synthetically generated data, thus bypassing a formal representation and suggesting a new role for transformers, namely as limited "soft theorem provers" operating over explicit theories in language.

REALM: Retrieval-Augmented Language Model Pre-Training

The effectiveness of Retrieval-Augmented Language Model pre-training (REALM) is demonstrated by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA) and is found to outperform all previous methods by a significant margin, while also providing qualitative benefits such as interpretability and modularity.

Language Models as Knowledge Bases?

An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.

What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models

A suite of diagnostics drawn from human language experiments are introduced, which allow us to ask targeted questions about information used by language models for generating predictions in context, and the popular BERT model is applied.

Logic-Guided Data Augmentation and Regularization for Consistent Question Answering

This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions by integrating logic rules and neural models by leveraging logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.

Unsupervised Commonsense Question Answering with Self-Talk

An unsupervised framework based on self-talk as a novel alternative to multiple-choice commonsense tasks, inspired by inquiry-based discovery learning, which improves performance on several benchmarks and competes with models that obtain knowledge from external KBs.

Obtaining Faithful Interpretations from Compositional Neural Networks

It is found that the intermediate outputs of NMNs differ from the expected output, illustrating that the network structure does not provide a faithful explanation of model behaviour, and particular choices for module architecture are proposed that yield much better faithfulness, at a minimal cost to accuracy.

A Logic-Driven Framework for Consistency of Neural Models

This paper proposes a learning framework for constraining models using logic rules to regularize them away from inconsistency, and instantiate it on natural language inference, where experiments show that enforcing invariants stated in logic can help make the predictions of neural models both accurate and consistent.