LinkBERT: Pretraining Language Models with Document Links

  title={LinkBERT: Pretraining Language Models with Document Links},
  author={Michihiro Yasunaga and Jure Leskovec and Percy Liang},
Language model (LM) pretraining captures various knowledge from text corpora, helping downstream tasks. However, existing methods such as BERT model a single document, and do not capture dependencies or knowledge that span across documents. In this work, we propose LinkBERT, an LM pretraining method that leverages links between documents, e.g., hyperlinks. Given a text corpus, we view it as a graph of documents and create LM inputs by placing linked documents in the same context. We then… 

Figures and Tables from this paper

Pre-training for Information Retrieval: Are Hyperlinks Fully Explored?

A progressive hyperlink predication (PHP) framework is proposed to explore the utilization of hyperlinks in pre-training and Experimental results on two large-scale ad-hoc retrieval datasets and six question-answering datasets demonstrate its superiority over existing pretraining methods.

Optimizing Bi-Encoder for Named Entity Recognition via Contrastive Learning

We present an efficient bi-encoder framework for named entity recognition (NER), which applies contrastive learning to map candidate text spans and entity types into the same vector representation

ParTNER: Paragraph Tuning for Named Entity Recognition on Clinical Cases in Spanish using mBERT + Rules

This work presents a transfer learning approach starting from multilingual BERT to tackle the problem of Spanish NER (species) and normalization in clinical cases by using sentence tokenization for training and a paragraph tuning strategy at the inference phase.

LED down the rabbit hole: exploring the potential of global attention for biomedical multi-document summarisation

This paper adapts PRIMERA (Xiao et al., 2022) to the biomedical domain by placing global attention on important biomedical entities in several ways, and analyses the outputs of the 23 result-ing models.

Win-Win Cooperation: Bundling Sequence and Span Models for Named Entity Recognition

Experimental results indicate that BL consistently enhances their performance, suggesting that it is possible to construct a new SOTA NER system by incorporating BL into the current SOTA system and reducing both entity boundary and type prediction errors.

Exploring Biomedical Question Answering with BioM-Transformers At BioASQ10B challenge: Findings and Techniques

This paper extends the investigation of the biomedical Questing Answering models with BioM-Transformers models by extending the grid search for hyper-parameters and addressing the limited size of the BioASQ10B-Factoid training set by merging it with the List training set.

A Computational Inflection for Scientific Discovery

The confluence of societal and computational trends suggests that computer science is poised to ignite a revolution in the scientific process itself.

Structure Inducing Pre-Training

Relative reduction of error (RRE) of models trained under the authors' framework vs. published per-token or per-sample baselines indicates models under the framework reduce error more and thus outperform baselines.



Cross-Document Language Modeling

The crossdocument language model (CD-LM) improves masked language modeling for multi-document NLP tasks with two key ideas, including pretraining with multiple related documents in a single input, via cross-document masking, which encourages the model to learn cross- document and long-range relationships.

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

It is shown that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.

Language Models as Knowledge Bases?

An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.

HTLM: Hyper-Text Pre-Training and Prompting of Language Models

It is shown that pretraining with a BART-style denoising loss directly on simplified HTML provides highly effective transfer for a wide range of end tasks and supervision levels, and that HTLM is highly effective at autoprompting itself.

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

A unified model for Knowledge Embedding and Pre-trained LanguagERepresentation (KEPLER), which can not only better integrate factual knowledge into PLMs but also produce effective text-enhanced KE with the strong PLMs is proposed.

ERNIE: Enhanced Language Representation with Informative Entities

This paper utilizes both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE) which can take full advantage of lexical, syntactic, and knowledge information simultaneously, and is comparable with the state-of-the-art model BERT on other common NLP tasks.

CoLAKE: Contextualized Language and Knowledge Embedding

The Contextualized Language and Knowledge Embedding (CoLAKE) is proposed, which jointly learns contextualized representation for both language and knowledge with the extended MLM objective, and achieves surprisingly high performance on a synthetic task called word-knowledge graph completion, which shows the superiority of simultaneously contextualizing language andknowledge representation.

SciBERT: A Pretrained Language Model for Scientific Text

SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.

GreaseLM: Graph REASoning Enhanced Language Models for Question Answering

This work proposes GREASELM, a new model that fuses encoded representations from pretrained LMs and graph neural networks over multiple layers of modality interaction operations, allowing language context representations to be grounded by structured world knowledge, and allowing linguistic nuances in the context to inform the graph representations of knowledge.

Pre-training for Ad-hoc Retrieval: Hyperlink is Also You Need

This paper proposes to leverage the large-scale hyperlinks and anchor texts to pre-train the language model for ad-hoc retrieval, and develops the Transformer model to predict the pair-wise preference, jointly with the Masked Language Model objective.