CoLAKE: Contextualized Language and Knowledge Embedding

@inproceedings{Sun2020CoLAKECL,
  title={CoLAKE: Contextualized Language and Knowledge Embedding},
  author={Tianxiang Sun and Yunfan Shao and Xipeng Qiu and Qipeng Guo and Yaru Hu and Xuanjing Huang and Zheng Zhang},
  booktitle={COLING},
  year={2020}
}
With the emerging branch of incorporating factual knowledge into pre-trained language models such as BERT, most existing models consider shallow, static, and separately pre-trained entity embeddings, which limits the performance gains of these models. Few works explore the potential of deep contextualized knowledge representation when injecting knowledge. In this paper, we propose the Contextualized Language and Knowledge Embedding (CoLAKE), which jointly learns contextualized representation… 

Figures and Tables from this paper

DKPLM: Decomposable Knowledge-enhanced Pre-trained Language Model for Natural Language Understanding
TLDR
A novel KEPLM named DKPLM that Decomposes Knowledge injection process of the Pre-trained Language Models in pre-training, fine-tuning and inference stages, which facilitates the applications of KEPLMs in real-world scenarios and has a higher inference speed than other competing models due to the decomposing mechanism.
HORNET: Enriching Pre-trained Language Representations with Heterogeneous Knowledge Sources
TLDR
A novel KEPLM named HORNET is proposed, which integrates Heterogeneous knowledge from various structured and unstructured sources into the Roberta NETwork and hence takes full advantage of both linguistic and factual knowledge simultaneously, and design a hybrid attention heterogeneous graph convolution network (HaHGCN).
Enhancing Language Models with Plug-and-Play Large-Scale Commonsense
TLDR
A plug-and-play method for largescale commonsense integration without further pre-training is proposed, inspired by the observation that when finetuning LMs for downstream tasks without external knowledge, the variation in the parameter space was minor.
KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs
TLDR
A novel knowledge-aware language model framework based on fine-tuning process, which equips PLM with a unified knowledge-enhanced text graph that contains both text and multi-relational sub-graphs extracted from KG, and design a hierarchical relational-graph-based message passing mechanism.
Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey
TLDR
A comprehensive survey of the literature on this emerging and fast-growing field Knowledge Enhanced Pretrained Language Models (KE-PLMs) is provided and three taxonomies are introduced to categorize existing work.
K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering
TLDR
K-AID is proposed, a systematic approach that includes a low-cost knowledge acquisition process for acquiring domain knowledge, an effective knowledge infusion module for improving model performance, and a knowledge distillation component for reducing the model size and deploying K-PLMs on resource-restricted devices for real-world application.
SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with Structured Semantics for Medical Text Mining
TLDR
In SMedBERT, a medical PLM trained on large-scale medical corpora, incorporating deep structured semantics knowledge from neighbours of linked-entity, the mention-neighbour hybrid attention is proposed to learn heterogeneousentity information, which infuses the semantic representations of entity types into the homogeneous neighbouring entity structure.
ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning
TLDR
A novel contrastive learning framework named ERICA is proposed in pre-training phase to obtain a deeper understanding of the entities and their relations in text and achieves consistent improvements on several documentlevel language understanding tasks, including relation extraction and reading comprehension, especially under low resource setting.
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
TLDR
A unified framework named ERNIE 3.0 is proposed for pre-training large-scale knowledge enhanced models that fuses auto-regressive network and auto-encoding network, so that the trained model can be easily tailored for both natural language understanding and generation tasks with zero-shot learning, few- shot learning or fine-tuning.
A Survey of Knowledge Enhanced Pre-trained Models
TLDR
This survey provides a comprehensive overview of pre-trained models with knowledge injection, which possess deep understanding and logical reasoning and introduce interpretability to some extent, and some potential directions of KEPTMs for future research.
...
1
2
3
...

References

SHOWING 1-10 OF 49 REFERENCES
K-BERT: Enabling Language Representation with Knowledge Graph
TLDR
This work proposes a knowledge-enabled language representation model (K-BERT) with knowledge graphs (KGs), in which triples are injected into the sentences as domain knowledge, which significantly outperforms BERT and demonstrates that K-berT is an excellent choice for solving the knowledge-driven problems that require experts.
ERNIE: Enhanced Language Representation with Informative Entities
TLDR
This paper utilizes both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE) which can take full advantage of lexical, syntactic, and knowledge information simultaneously, and is comparable with the state-of-the-art model BERT on other common NLP tasks.
Barack’s Wife Hillary: Using Knowledge Graphs for Fact-Aware Language Modeling
TLDR
This work introduces the knowledge graph language model (KGLM), a neural language model with mechanisms for selecting and copying facts from a knowledge graph that are relevant to the context that enable the model to render information it has never seen before, as well as generate out-of-vocabulary tokens.
Knowledge Enhanced Contextual Word Representations
TLDR
After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation.
Language Models as Knowledge Bases?
TLDR
An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.
CoKE: Contextualized Knowledge Graph Embedding
TLDR
Contextualized Knowledge Graph Embedding (CoKE) is presented, a novel paradigm that takes into account such contextual nature, and learns dynamic, flexible, and fully contextualized entity and relation embeddings.
Representation Learning of Knowledge Graphs with Entity Descriptions
TLDR
Experimental results on real-world datasets show that, the proposed novel RL method for knowledge graphs outperforms other baselines on the two tasks, especially under the zero-shot setting, which indicates that the method is capable of building representations for novel entities according to their descriptions.
Integrating Graph Contextualized Knowledge into Pre-trained Language Models
TLDR
Experimental results demonstrate that the proposed model achieves the state-of-the-art performance on several medical NLP tasks, and the improvement above MedERNIE indicates that graph contextualized knowledge is beneficial.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Embedding Entities and Relations for Learning and Inference in Knowledge Bases
TLDR
It is found that embeddings learned from the bilinear objective are particularly good at capturing relational semantics and that the composition of relations is characterized by matrix multiplication.
...
1
2
3
4
5
...