Don’t Read Too Much Into It: Adaptive Computation for Open-Domain Question Answering

@inproceedings{Wu2020DontRT,
  title={Don’t Read Too Much Into It: Adaptive Computation for Open-Domain Question Answering},
  author={Yuxiang Wu and Pasquale Minervini and Pontus Stenetorp and Sebastian Riedel},
  booktitle={SUSTAINLP},
  year={2020}
}
Most approaches to Open-Domain Question Answering consist of a light-weight retriever that selects a set of candidate passages, and a computationally expensive reader that examines the passages to identify the correct answer. Previous works have shown that as the number of retrieved passages increases, so does the performance of the reader. However, they assume all retrieved passages are of equal importance and allocate the same amount of computation to them, leading to a substantial increase… Expand

Figures and Tables from this paper

Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints
TLDR
The proposed Adaptive Passage Encoder keeps the parameters of the base ODQA model fixed, but it overrides the default layer-by-layer computation of the encoder with an AC policy that is trained to optimise the computational efficiency of the model. Expand
Set-to-Sequence Methods in Machine Learning: a Review
TLDR
This paper provides a comprehensive introduction to the field as well as an overview of important machine learning methods tackling both of these key challenges, with a detailed qualitative comparison of selected model architectures. Expand

References

SHOWING 1-10 OF 29 REFERENCES
Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering
TLDR
A multi-passage BERT model is proposed to globally normalize answer scores across all passages of the same question, and this change enables the QA model to find better answers by utilizing more passages. Expand
Reading Wikipedia to Answer Open-Domain Questions
TLDR
This approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs, indicating that both modules are highly competitive with respect to existing counterparts. Expand
The Right Tool for the Job: Matching Model and Instance Complexities
TLDR
This work proposes a modification to contextual representation fine-tuning which allows for an early (and fast) “exit” from neural network calculations for simple instances, and late (and accurate) exit for hard instances during inference. Expand
, and Lukasz Kaiser . 2019 . Universal transformers . In ICLR ( Poster ) . OpenRe - view . net . Shrey Desai and Greg Durrett . 2020 . Calibration of pre - trained transformers
  • 2019
Dense Passage Retrieval for Open-Domain Question Answering
TLDR
This work shows that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. Expand
Depth-Adaptive Transformer
TLDR
This paper trains Transformer models which can make output predictions at different stages of the network and investigates different ways to predict how much computation is required for a particular sequence. Expand
End-to-End Open-Domain Question Answering with BERTserini
TLDR
An end-to-end question answering system that integrates BERT with the open-source Anserini information retrieval toolkit is demonstrated, showing that fine-tuning pretrained Bert with SQuAD is sufficient to achieve high accuracy in identifying answer spans. Expand
Adaptive Computation Time for Recurrent Neural Networks
TLDR
Performance is dramatically improved and insight is provided into the structure of the data, with more computation allocated to harder-to-predict transitions, such as spaces between words and ends of sentences, which suggests that ACT or other adaptive computation methods could provide a generic method for inferring segment boundaries in sequence data. Expand
Conditional Computation in Neural Networks for faster models
TLDR
This paper applies a policy gradient algorithm for learning policies that optimize this loss function and proposes a regularization mechanism that encourages diversification of the dropout policy and presents encouraging empirical results showing that this approach improves the speed of computation without impacting the quality of the approximation. Expand
Understanding inverse document frequency: on theoretical arguments for IDF
  • S. Robertson
  • Mathematics, Computer Science
  • J. Documentation
  • 2004
TLDR
It is shown that the Information Theory approaches are problematic, but that there are good theoretical justifications of both IDF and TF*IDF in the traditional probabilistic model of information retrieval. Expand
...
1
2
3
...