• Corpus ID: 229924110

Studying Strategically: Learning to Mask for Closed-book QA

  title={Studying Strategically: Learning to Mask for Closed-book QA},
  author={Qinyuan Ye and Belinda Z. Li and Sinong Wang and Benjamin Bolte and Hao Ma and Xiang Ren and Wen-tau Yih and Madian Khabsa},
Closed-book question-answering (QA) is a challenging task that requires a model to directly answer questions without access to external knowledge. It has been shown that directly fine-tuning pre-trained language models with (question, answer) examples yields surprisingly competitive performance, which is further improved upon through adding an intermediate pre-training stage between general pre-training and fine-tuning. Prior work used a heuristic during this intermediate stage, whereby named… 

Figures and Tables from this paper

On the Influence of Masking Policies in Intermediate Pre-training
This paper investigates the effects of using heuristic, directly supervised, and meta-learned MLM policies for intermediate pretraining, on eight selected tasks across three categories (closed-book QA, knowledgeintensive language tasks, and abstractive summarization).
Read before Generate! Faithful Long Form Question Answering with Machine Reading
A new end-to-end framework is proposed that jointly models answer generation and machine reading with an emphasis on faithful facts and state-of-the-art results on two LFQA datasets demonstrate the effectiveness of the method, in comparison with strong baselines on automatic and human evaluation metrics.
COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
A self-supervised learning framework that pretrains Language Models by COrrecting and COntrasting corrupted text sequences that outperforms recent state-of-the-art pretrained models in accuracy, but also improves pretraining efficiency.
PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them
It is found that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models, whilst being significantly faster, and a new QA-pair retriever, RePAZ, is introduced to complement PAQ.


REALM: Retrieval-Augmented Language Model Pre-Training
The effectiveness of Retrieval-Augmented Language Model pre-training (REALM) is demonstrated by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA) and is found to outperform all previous methods by a significant margin, while also providing qualitative benefits such as interpretability and modularity.
Train No Evil: Selective Masking for Task-guided Pre-training
Experimental results on two sentiment analysis tasks show that the proposed selective masking task-guided pre-training method can achieve comparable or even better performance with less than 50\% overall computation cost, which indicates the method is both effective and efficient.
Language Models as Knowledge Bases?
An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
This work proposes pre-training large Transformer-based encoder-decoder models on massive text corpora with a new self-supervised objective, PEGASUS, and demonstrates it achieves state-of-the-art performance on all 12 downstream datasets measured by ROUGE scores.
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
The contextual representations learned by the proposed replaced token detection pre-training task substantially outperform the ones learned by methods such as BERT and XLNet given the same model size, data, and compute.
Unsupervised Commonsense Question Answering with Self-Talk
An unsupervised framework based on self-talk as a novel alternative to multiple-choice commonsense tasks, inspired by inquiry-based discovery learning, which improves performance on several benchmarks and competes with models that obtain knowledge from external KBs.
Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation
The Neural Mask Generator is validated on several question answering and text classification datasets using BERT and DistilBERT as the language models, on which it outperforms rule-based masking strategies, by automatically learning optimal adaptive maskings.
SpanBERT: Improving Pre-training by Representing and Predicting Spans
The approach extends BERT by masking contiguous random spans, rather than random tokens, and training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it.