• Corpus ID: 211204736

REALM: Retrieval-Augmented Language Model Pre-Training

  title={REALM: Retrieval-Augmented Language Model Pre-Training},
  author={Kelvin Guu and Kenton Lee and Zora Tung and Panupong Pasupat and Ming-Wei Chang},
Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus… 

Figures and Tables from this paper

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

A general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation, and finds that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.

RetroNLU: Retrieval Augmented Task-Oriented Semantic Parsing

The technique, RetroNLU, extends a sequence-to-sequence model architecture with a retrieval component, which is used to retrieve existing similar samples and present them as an additional context to the model to outperform the baseline method by 1.5% absolute macro-F1.

An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks

The Efficient Memory-Augmented Transformer (EMAT) is proposed – it encodes external knowledge into a key-value memory and exploits the fast maximum inner product search for memory querying and runs substantially faster across the board and produces more accurate results on WoW and ELI5.

ANNA”:" Enhanced Language Representation for Question Answering

This paper proposes an extended pre- training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling.

Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval

Recent research demonstrates the effectiveness of using fine-tuned language models (LM) for dense retrieval. However, dense retrievers are hard to train, typically requiring heavily engineered

Few-shot Learning with Retrieval Augmented Language Models

Atlas is presented, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few training examples, and the impact of the content of the document index is studied, showing that it can easily be updated.

How Context Affects Language Models' Factual Predictions

This paper reports that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.

Learning Dense Representations of Phrases at Scale

This work shows for the first time that it can learn dense representations of phrases alone that achieve much stronger performance in open-domain QA and proposes a query-side fine-tuning strategy, which can support transfer learning and reduce the discrepancy between training and inference.

LM-CORE: Language Models with Contextually Relevant External Knowledge

Experimental results show that LM-CORE, having access to external knowledge, achieves signif-icant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks; can effectively handle knowledge updates; and performs well on two downstream tasks.

Studying Strategically: Learning to Mask for Closed-book QA

This paper aims to learn the optimal masking strategy for the intermediate pretraining stage, and first train the masking policy to extract spans that are likely to be tested, using supervision from the downstream task itself, then deploy the learned policy during intermediate pre-training.



Language Models as Knowledge Bases?

An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Knowledge Enhanced Contextual Word Representations

After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Learning Recurrent Span Representations for Extractive Question Answering

This paper presents a novel model architecture that efficiently builds fixed length representations of all spans in the evidence document with a recurrent network, and shows that scoring explicit span representations significantly improves performance over other approaches that factor the prediction into separate predictions about words or start and end markers.

Latent Retrieval for Weakly Supervised Open Domain Question Answering

It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match.

Skip-Thought Vectors

We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the

End-To-End Memory Networks

A neural network with a recurrent attention model over a possibly large external memory that is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings.

A Retrieve-and-Edit Framework for Predicting Structured Outputs

This work proposes an approach that first retrieves a training example based on the input and then edits it to the desired output, and shows that on a new autocomplete task for GitHub Python code and the Hearthstone cards benchmark, retrieve-and-edit significantly boosts the performance of a vanilla sequence-to-sequence model on both tasks.