The Effectiveness of Masked Language Modeling and Adapters for Factual Knowledge Injection

  title={The Effectiveness of Masked Language Modeling and Adapters for Factual Knowledge Injection},
  author={Sondre Wold},
  • Sondre Wold
  • Published 3 October 2022
  • Computer Science
  • ArXiv
This paper studies the problem of injecting factual knowledge into large pre-trained language models. We train adapter modules on parts of the ConceptNet knowledge graph using the masked language modeling objective and evaluate the success of the method by a series of probing experiments on the LAMA probe. Mean P@K curves for different configurations indicate that the technique is effective, increasing the performance on sub-sets of the LAMA probe for large values of k by adding as little as 2… 

Figures from this paper



Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers

A deeper analysis reveals that the adapter-based models substantially outperform BERT on inference tasks that require the type of conceptual knowledge explicitly present in ConceptNet and its corresponding Open Mind Common Sense corpus.

LM-CORE: Language Models with Contextually Relevant External Knowledge

Experimental results show that LM-CORE, having access to external knowledge, achieves signif-icant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks; can effectively handle knowledge updates; and performs well on two downstream tasks.

AdapterHub: A Framework for Adapting Transformers

AdaptersHub is proposed, a framework that allows dynamic “stiching-in” of pre-trained adapters for different tasks and languages that enables scalable and easy access to sharing of task-specific models, particularly in low-resource scenarios.

K-BERT: Enabling Language Representation with Knowledge Graph

This work proposes a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge, which significantly outperforms BERT and reveals promising results in twelve NLP tasks.

SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with Structured Semantics for Medical Text Mining

In SMedBERT, a medical PLM trained on large-scale medical corpora, incorporating deep structured semantic knowledge from neighbours of linked-entity, the mention-neighbour hybrid attention is proposed to learn heterogeneous-entity information, which infuses the semantic representations of entity types into the homogeneous neighbouring entity structure.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

A new version of the linked open data resource ConceptNet is presented that is particularly well suited to be used with modern NLP techniques such as word embeddings, with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies.

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering

This work proposes a new model, QA-GNN, which addresses the problem of answering questions using knowledge from pre-trained language models (LMs) and knowledge graphs (KGs) through two key innovations: relevance scoring and joint reasoning.

Knowledge Enhanced Contextual Word Representations

After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.