Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks

@inproceedings{Rogers2020GettingCT,
  title={Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks},
  author={Anna Rogers and Olga Kovaleva and Matthew Downey and Anna Rumshisky},
  booktitle={AAAI},
  year={2020}
}
The recent explosion in question answering research produced a wealth of both factoid reading comprehension (RC) and commonsense reasoning datasets. Combining them presents a different kind of task: deciding not simply whether information is present in the text, but also whether a confident guess could be made for the missing information. We present QuAIL, the first RC dataset to combine text-based, world knowledge and unanswerable questions, and to provide question type annotation that would… 

Figures and Tables from this paper

TellMeWhy: A Dataset for Answering Why-Questions in Narratives
TLDR
This work introduces TellMeWhy, a new crowd-sourced dataset that consists of more than 30k questions and free-form answers concerning why characters in short narratives perform the actions described, and shows that state-of-the-art models are far below human performance on answering such questions.
A guide to the dataset explosion in QA, NLI, and commonsense reasoning
TLDR
This tutorial aims to provide an up-to-date guide to the recent datasets, survey the old and new methodological issues with dataset construction, and outline the existing proposals for overcoming them.
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
TLDR
The largest survey of the field to date of question answering and reading comprehension, providing an overview of the various formats and domains of the current resources, and highlighting the current lacunae for future work.
What Makes Reading Comprehension Questions Difficult?
TLDR
Crowdsource multiple-choice reading comprehension questions for passages taken from seven qualitatively distinct sources suggest that selecting a diverse set of passages can help ensure a diverse range of question types, but that passage difficulty need not be a priority.
Subjective Question Answering: Deciphering the inner workings of Transformers in the realm of subjectivity
TLDR
The inner workings (i.e., latent representations) of a Transformer-based architecture are investigated to contribute to a better understanding of these not yet well understood "black-box" models.
QED: A Framework and Dataset for Explanations in Question Answering
TLDR
A large user study is described showing that the presence of QED explanations significantly improves the ability of untrained raters to spot errors made by a strong neural QA baseline.
UKP-SQUARE: An Online Platform for Question Answering Research
TLDR
UKP-SQuARE is an extensible online QA platform for researchers which allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated behavioural tests.
Question Generation for Reading Comprehension Assessment by Modeling How and What to Ask
TLDR
A two-step model (HTA-WTA) that takes advantage of previous datasets, and can generate questions for a specific targeted comprehension skill, and a new reading comprehension dataset that contains questions annotated with story-based reading comprehension skills (SBRCS), allowing for a more complete reader assessment.
Comparing Test Sets with Item Response Theory
TLDR
Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models.
Cross-lingual Training for Multiple-Choice Question Answering
TLDR
The results show that both monolingual and multilingual models can be zero-shot transferred to a different dataset in the same language maintaining its performance and that exams that are more difficult to humans are harder for machines too.
...
...

References

SHOWING 1-10 OF 35 REFERENCES
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
TLDR
This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems.
QuAC: Question Answering in Context
TLDR
QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as it shows in a detailed qualitative evaluation.
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
TLDR
A new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI.
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
TLDR
It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
TLDR
A new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs, and presents a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TLDR
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
Multi-Relational Question Answering from Narratives: Machine Reading and Reasoning in Simulated Worlds
TLDR
This work generates and releases TextWorldsQA, a set of five diverse datasets that contain dynamic narrative that describes entities and relations in a simulated world, paired with variably compositional questions over that knowledge, and releases a lightweight Python-based framework for easily generating arbitrary additional worlds and narrative.
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
TLDR
It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.
The NarrativeQA Reading Comprehension Challenge
TLDR
A new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts are presented, designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.
Know What You Don’t Know: Unanswerable Questions for SQuAD
TLDR
SQuadRUn is a new dataset that combines the existing Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.
...
...