Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering

@article{Chen2022AugmentingPL,
  title={Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering},
  author={Wenhu Chen and Pat Verga and Michiel de Jong and John Wieting and William Cohen},
  journal={ArXiv},
  year={2022},
  volume={abs/2204.04581}
}
Existing state-of-the-art methods for open-domain question-answering (ODQA) gener-ally used a open book approach, in which information is retrieved from a large text corpus or knowledge base (KB), and then rea-soned with to produce an answer. A recent alternative is to retrieve from a collection of previously-generated question-answer pairs. This has several practical advantages, including being more memory- and compute-efficient. Question-answer pairs are also appealing in that they seem to be… 

Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute

It is shown that LUMEN outperforms pure memory on multiple question-answering tasks while being much cheaper than FiD, and outperforms both for any given compute budget, and the advantage of LUMen over FiD increases with model size.

References

SHOWING 1-10 OF 49 REFERENCES

PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

It is found that PAQ preempts and caches test questions, enabling RePAQ to match the accuracy of recent retrieve-and-read models, whilst being significantly faster, and a new QA-pair retriever, RePAZ, is introduced to complement PAQ.

Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering

Interestingly, it is observed that the performance of this method significantly improves when increasing the number of retrieved passages, evidence that sequence-to-sequence models offers a flexible framework to efficiently aggregate and combine evidence from multiple passages.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.

Know What You Don’t Know: Unanswerable Questions for SQuAD

SQuadRUn is a new dataset that combines the existing Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.

♫ MuSiQue: Multihop Questions via Single-hop Question Composition

A bottom–up approach is introduced that systematically selects composable pairs of single-hop questions that are connected, that is, where one reasoning step critically relies on information from another, to create a new multihop QA dataset with 25K 2–4 hop questions.

Dense Passage Retrieval for Open-Domain Question Answering

This work shows that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework.

Latent Retrieval for Weakly Supervised Open Domain Question Answering

It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match.

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

The Probabilistic Relevance Framework: BM25 and Beyond

This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F.