SituatedQA: Incorporating Extra-Linguistic Contexts into QA

  title={SituatedQA: Incorporating Extra-Linguistic Contexts into QA},
  author={Michael J.Q. Zhang and Eunsol Choi},
Answers to the same question may change depending on the extra-linguistic contexts (when and where the question was asked). To study this challenge, we introduce SituatedQA, an open-retrieval QA dataset where systems must produce the correct answer to a question given the temporal or geographical context. To construct SituatedQA, we first identify such questions in existing QA datasets. We find that a significant proportion of information seeking questions have context-dependent answers (e.g… 

Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios

This study frames this task by asking multiple questions with the same set of possible endings as candidate answers, given a short story text, and discovers that even current strong pretrained language models struggle to answer the questions consistently.

RealTime QA: What's the Answer Right Now?

It is found that GPT-3 tends to return outdated answers when retrieved documents do not provide sufficient information to provide an answer, suggesting an important avenue for future research: can an open domain QA system iden-tify such unanswerable cases and communi-cate with the user or even the retrieval module to modify the retrieval results?

QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension

This study is the largest survey of the deep learning models in NLP to date, providing an overview of the various formats and domains of the current resources, and highlighting the current lacunae for future work.

ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers

It is shown that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions, and will motivate further research in answering complex questions over long documents.

StreamingQA: A Benchmark for Adaptation to New Knowledge over Time in Question Answering Models

It is shown that parametric models can be updated without full retraining, while avoiding catastrophic forgetting, and for semi-parametric models, adding new articles into the search space allows for rapid adaptation, however, models with an outdated underlying LM under-perform those with a retrained LM.

Reasoning over Logically Interacted Conditions for Question Answering

A new model, TReasoner, is proposed, which consists of an entailment module, a reasoning module, and a generation module (if the answers are free-form text spans), and achieves state-of-the-art performance on two benchmark conditional QA datasets, outper-forming the previous state of theart by 3-10 points.

Towards Continual Knowledge Learning of Language Models

This work constructs a new benchmark and metric to quantify the retention of time-invariant world knowledge, the update of outdated knowledge, and the acquisition of new knowledge in Continual Knowledge Learning.

A Dataset for Answering Time-Sensitive Questions

This work proposes to construct a time-sensitive QA dataset and demonstrates that these models are still lacking the ability to perform robust temporal understanding and reasoning, and believes that this dataset could serve as a benchmark to empower future studies in temporal reasoning.

Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA

An English question-generation model is developed which is demon-strate via automatic and human evaluation produces less ambiguous questions and allows the model to integrate answer group information without any direct supervision.

Entity Cloze By Date: What LMs Know About Unseen Entities

A framework to analyze what LMs can in-fer about new entities that did not exist when the LMs were pretrained is proposed and it is shown that models more informed about the entities, such as those with access to a textual version of them, achieve lower perplexity on this benchmark.



QuAC: Question Answering in Context

QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as it shows in a detailed qualitative evaluation.

AmbigQA: Answering Ambiguous Open-domain Questions

This paper introduces AmbigQA, a new open-domain question answering task which involves predicting a set of question-answer pairs, where every plausible answer is paired with a disambiguated rewrite of the original question.

TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages

A quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora are presented.

Natural Questions: A Benchmark for Question Answering Research

The Natural Questions corpus, a question answering data set, is presented, introducing robust metrics for the purposes of evaluating question answering systems; demonstrating high human upper bounds on these metrics; and establishing baseline results using competitive methods drawn from related literature.

Know What You Don’t Know: Unanswerable Questions for SQuAD

SQuadRUn is a new dataset that combines the existing Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.

Latent Retrieval for Weakly Supervised Open Domain Question Answering

It is shown for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system, and outperforming BM25 by up to 19 points in exact match.

MultiModalQA: Complex Question Answering over Text, Tables and Images

This paper creates MMQA, a challenging question answering dataset that requires joint reasoning over text, tables and images, and defines a formal language that allows it to take questions that can be answered from a single modality, and combine them to generate cross-modal questions.

Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering

It is shown that a simple modification of adding presuppositions and their verifiability to the input of a competitive end-to-end QA system yields modest gains in QA performance and unanswerability detection, demonstrating the promise of the approach.

REALM: Retrieval-Augmented Language Model Pre-Training

The effectiveness of Retrieval-Augmented Language Model pre-training (REALM) is demonstrated by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA) and is found to outperform all previous methods by a significant margin, while also providing qualitative benefits such as interpretability and modularity.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).