Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks
@inproceedings{Rogers2020GettingCT, title={Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks}, author={Anna Rogers and Olga Kovaleva and Matthew Downey and Anna Rumshisky}, booktitle={AAAI}, year={2020} }
The recent explosion in question answering research produced a wealth of both factoid reading comprehension (RC) and commonsense reasoning datasets. Combining them presents a different kind of task: deciding not simply whether information is present in the text, but also whether a confident guess could be made for the missing information. We present QuAIL, the first RC dataset to combine text-based, world knowledge and unanswerable questions, and to provide question type annotation that would…
34 Citations
TellMeWhy: A Dataset for Answering Why-Questions in Narratives
- Computer ScienceFINDINGS
- 2021
This work introduces TellMeWhy, a new crowd-sourced dataset that consists of more than 30k questions and free-form answers concerning why characters in short narratives perform the actions described, and shows that state-of-the-art models are far below human performance on answering such questions.
A guide to the dataset explosion in QA, NLI, and commonsense reasoning
- Computer ScienceCOLING
- 2020
This tutorial aims to provide an up-to-date guide to the recent datasets, survey the old and new methodological issues with dataset construction, and outline the existing proposals for overcoming them.
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
- Computer ScienceArXiv
- 2021
The largest survey of the field to date of question answering and reading comprehension, providing an overview of the various formats and domains of the current resources, and highlighting the current lacunae for future work.
What Makes Reading Comprehension Questions Difficult?
- Computer ScienceACL
- 2022
Crowdsource multiple-choice reading comprehension questions for passages taken from seven qualitatively distinct sources suggest that selecting a diverse set of passages can help ensure a diverse range of question types, but that passage difficulty need not be a priority.
Subjective Question Answering: Deciphering the inner workings of Transformers in the realm of subjectivity
- Computer ScienceArXiv
- 2020
The inner workings (i.e., latent representations) of a Transformer-based architecture are investigated to contribute to a better understanding of these not yet well understood "black-box" models.
QED: A Framework and Dataset for Explanations in Question Answering
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2021
A large user study is described showing that the presence of QED explanations significantly improves the ability of untrained raters to spot errors made by a strong neural QA baseline.
UKP-SQUARE: An Online Platform for Question Answering Research
- Computer ScienceACL
- 2022
UKP-SQuARE is an extensible online QA platform for researchers which allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated behavioural tests.
Question Generation for Reading Comprehension Assessment by Modeling How and What to Ask
- Computer ScienceFINDINGS
- 2022
A two-step model (HTA-WTA) that takes advantage of previous datasets, and can generate questions for a specific targeted comprehension skill, and a new reading comprehension dataset that contains questions annotated with story-based reading comprehension skills (SBRCS), allowing for a more complete reader assessment.
Comparing Test Sets with Item Response Theory
- Computer ScienceACL
- 2021
Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models.
Cross-lingual Training for Multiple-Choice Question Answering
- Computer Science, LinguisticsProces. del Leng. Natural
- 2020
The results show that both monolingual and multilingual models can be zero-shot transferred to a different dataset in the same language maintaining its performance and that exams that are more difficult to humans are harder for machines too.
References
SHOWING 1-10 OF 35 REFERENCES
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
- Computer ScienceICLR
- 2016
This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems.
QuAC: Question Answering in Context
- Computer ScienceEMNLP
- 2018
QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context, as it shows in a detailed qualitative evaluation.
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
- Computer ScienceArXiv
- 2018
A new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI.
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
- Computer ScienceEMNLP
- 2018
It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
- Computer ScienceNAACL
- 2019
A new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs, and presents a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
- Computer ScienceEMNLP
- 2016
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
Multi-Relational Question Answering from Narratives: Machine Reading and Reasoning in Simulated Worlds
- Computer ScienceACL
- 2018
This work generates and releases TextWorldsQA, a set of five diverse datasets that contain dynamic narrative that describes entities and relations in a simulated world, paired with variably compositional questions over that knowledge, and releases a lightweight Python-based framework for easily generating arbitrary additional worlds and narrative.
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
- Computer ScienceACL
- 2017
It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.
The NarrativeQA Reading Comprehension Challenge
- Computer ScienceTACL
- 2018
A new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts are presented, designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.
Know What You Don’t Know: Unanswerable Questions for SQuAD
- Computer ScienceACL
- 2018
SQuadRUn is a new dataset that combines the existing Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.