REALM: Retrieval-Augmented Language Model Pre-Training
@article{Guu2020REALMRL,
title={REALM: Retrieval-Augmented Language Model Pre-Training},
author={Kelvin Guu and Kenton Lee and Zora Tung and Panupong Pasupat and Ming-Wei Chang},
journal={ArXiv},
year={2020},
volume={abs/2002.08909}
}Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts.
To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus…
593 Citations
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- 2020
Computer Science
NeurIPS
A general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation, and finds that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.
RetroNLU: Retrieval Augmented Task-Oriented Semantic Parsing
- 2022
Computer Science
NLP4CONVAI
The technique, RetroNLU, extends a sequence-to-sequence model architecture with a retrieval component, which is used to retrieve existing similar samples and present them as an additional context to the model to outperform the baseline method by 1.5% absolute macro-F1.
ANNA”:" Enhanced Language Representation for Question Answering
- 2022
Computer Science
REPL4NLP
This paper proposes an extended pre- training task, and a new neighbor-aware mechanism that attends neighboring tokens more to capture the richness of context for pre-training language modeling.
Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval
- 2022
Computer Science
ACL
Recent research demonstrates the effectiveness of using fine-tuned language models (LM) for dense retrieval. However, dense retrievers are hard to train, typically requiring heavily engineered…
How Context Affects Language Models' Factual Predictions
- 2020
Computer Science
AKBC
This paper reports that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading baseline.
An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks
- 2022
Computer Science
EMNLP
The Efficient Memory-Augmented Transformer (EMAT) is proposed – it encodes external knowledge into a key-value memory and exploits the fast maximum inner product search for memory querying and produces more accurate results on WoW and ELI5.
LM-CORE: Language Models with Contextually Relevant External Knowledge
- 2022
Computer Science
NAACL-HLT
Experimental results show that LM-CORE, having access to external knowledge, achieves significant and robust outperformance over state-of-the-art knowledge-enhanced language models on knowledge probing tasks; can effectively handle knowledge updates; and performs well on two downstream tasks.
Learning Dense Representations of Phrases at Scale
- 2021
Computer Science
ACL
This work shows for the first time that it can learn dense representations of phrases alone that achieve much stronger performance in open-domain QA and proposes a query-side fine-tuning strategy, which can support transfer learning and reduce the discrepancy between training and inference.
Few-shot Learning with Retrieval Augmented Language Models
- 2022
Computer Science
ArXiv
This work presents Atlas, a carefully designed and pre-trained retrieval augmented language model able to learn knowledge intensive tasks with very few training examples, and studies the impact of the content of the document index, showing that it can easily be updated.
Studying Strategically: Learning to Mask for Closed-book QA
- 2020
Computer Science
ArXiv
This paper first train the optimal masking policy to extract spans that are likely to be tested, using supervision from the downstream task itself, then deploy the learned policy during intermediate pre-training, which outperforms strong heuristics when used to pre-train BART.
42 References
Language Models as Knowledge Bases?
- 2019
Computer Science
EMNLP
An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.
Language Models are Unsupervised Multitask Learners
- 2019
Computer Science
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- 2020
Computer Science
J. Mach. Learn. Res.
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
Knowledge Enhanced Contextual Word Representations
- 2019
Computer Science
EMNLP
After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- 2019
Computer Science
NAACL
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Learning Recurrent Span Representations for Extractive Question Answering
- 2016
Computer Science
ArXiv
This paper presents a novel model architecture that efficiently builds fixed length representations of all spans in the evidence document with a recurrent network, and shows that scoring explicit span representations significantly improves performance over other approaches that factor the prediction into separate predictions about words or start and end markers.
Latent Retrieval for Weakly Supervised Open Domain Question Answering
- 2019
Computer Science
ACL
It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match.
Skip-Thought Vectors
- 2015
Computer Science
NIPS
We describe an approach for unsupervised learning of a generic, distributed sentence encoder. Using the continuity of text from books, we train an encoder-decoder model that tries to reconstruct the…
End-To-End Memory Networks
- 2015
Computer Science
NIPS
A neural network with a recurrent attention model over a possibly large external memory that is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings.
A Retrieve-and-Edit Framework for Predicting Structured Outputs
- 2018
Computer Science
NeurIPS
This work proposes an approach that first retrieves a training example based on the input and then edits it to the desired output, and shows that on a new autocomplete task for GitHub Python code and the Hearthstone cards benchmark, retrieve-and-edit significantly boosts the performance of a vanilla sequence-to-sequence model on both tasks.







