Language Models as Knowledge Bases?

  title={Language Models as Knowledge Bases?},
  author={Fabio Petroni and Tim Rockt{\"a}schel and Patrick Lewis and Anton Bakhtin and Yuxiang Wu and Alexander H. Miller and Sebastian Riedel},
Recent progress in pretraining language models on large textual corpora led to a surge of improvements for downstream NLP tasks. Whilst learning linguistic knowledge, these models may also be storing relational knowledge present in the training data, and may be able to answer queries structured as “fill-in-the-blank” cloze statements. Language models have many advantages over structured knowledge bases: they require no schema engineering, allow practitioners to query about an open class of… 

Figures and Tables from this paper

Prompt Tuning or Fine-Tuning - Investigating Relational Knowledge in Pre-Trained Language Models

This work performs an adaptive fine-tuning of the pre-trained language model on the standard fill-mask task using a small training dataset of existing facts from a knowledge graph to show that even fewer training relations are needed to achieve high knowledge extraction quality.

REALM: Retrieval-Augmented Language Model Pre-Training

The effectiveness of Retrieval-Augmented Language Model pre-training (REALM) is demonstrated by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA) and is found to outperform all previous methods by a significant margin, while also providing qualitative benefits such as interpretability and modularity.

LM-CORE: Language Models with Contextually Relevant External Knowledge

Experimental results show that LM-CORE, having access to external knowledge, achieves signif-icant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks; can effectively handle knowledge updates; and performs well on two downstream tasks.

A Survey of Knowledge-Enhanced Pre-trained Language Models

A comprehensive review of Knowledge-Enhanced Pre-trained Language Models (KE-PLMs) is presented to provide a clear insight into this thriving industry and introduces appropriate taxonomies respectively for Natural Language Understanding (NLU) and Natural Language Generation (NLG) to highlight the focus of these two kinds of tasks.

Relational World Knowledge Representation in Contextual Language Models: A Review

This work proposes to organize knowledge representation strategies in LMs by the level of KB supervision provided, from no KB supervision at all to entity- and relation-level supervision, and provides a high-level, extensible taxonomy for knowledge representation in L Ms.

LM-KBC: Knowledge Base Construction from Pre-trained Language Models

The authors present a system that performed task-specific pre-training of BERT, employed prompt decomposition for progressive generation of candidate objects, and use adaptive thresholds for final candidate object selection.

Towards Continual Knowledge Learning of Language Models

This work constructs a new benchmark and metric to quantify the retention of time-invariant world knowledge, the update of outdated knowledge, and the acquisition of new knowledge in Continual Knowledge Learning.

Language Models are Open Knowledge Graphs

This paper shows how to construct knowledge graphs (KGs) from pre-trained language models (e.g., BERT, GPT-2/3), without human supervision, and proposes an unsupervised method to cast the knowledge contained within language models into KGs.

An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks

The Efficient Memory-Augmented Transformer (EMAT) is proposed – it encodes external knowledge into a key-value memory and exploits the fast maximum inner product search for memory querying and runs substantially faster across the board and produces more accurate results on WoW and ELI5.

On Effectively Learning of Knowledge in Continual Pre-training

Two solutions to help the model learn more knowledge from unstructured text in a fully self-supervised manner are developed and are likely to give wrong predictions on K-B tokens and attend less attention to those tokens inside the self-attention module.



Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Improving Language Understanding by Generative Pre-Training

The general task-agnostic model outperforms discriminatively trained models that use architectures specifically crafted for each task, improving upon the state of the art in 9 out of the 12 tasks studied.

T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples

T-REx, a dataset of large scale alignments between Wikipedia abstracts and Wikidata triples, is presented, which is two orders of magnitude larger than the largest available alignments dataset and covers 2.5 times more predicates.

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

This work presents CommonsenseQA: a challenging new dataset for commonsense question answering, which extracts from ConceptNet multiple target concepts that have the same semantic relation to a single source concept.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Context-Aware Representations for Knowledge Base Relation Extraction

It is demonstrated that for sentence-level relation extraction it is beneficial to consider other relations in the sentential context while predicting the target relation and to combine the context representations with an attention mechanism to make the final prediction.

A Survey of Reinforcement Learning Informed by Natural Language

The time is right to investigate a tight integration of natural language understanding into Reinforcement Learning in particular, and the state of the field is surveyed, including work on instruction following, text games, and learning from textual domain knowledge.

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

There is substantial room for improvement in NLI systems, and the HANS dataset can motivate and measure progress in this area, which contains many examples where the heuristics fail.

Dissecting Contextual Word Embeddings: Architecture and Representation

There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.