IIRC: A Dataset of Incomplete Information Reading Comprehension Questions

@inproceedings{Ferguson2020IIRCAD,
  title={IIRC: A Dataset of Incomplete Information Reading Comprehension Questions},
  author={James Ferguson and Matt Gardner and Tushar Khot and Pradeep Dasigi},
  booktitle={EMNLP},
  year={2020}
}
Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension (RC) tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system's performance at identifying a potential lack of sufficient information and locating sources for that information. To fill this gap, we present a dataset, IIRC, with more than 13K questions over paragraphs from English Wikipedia that… 

Figures and Tables from this paper

A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers
TLDR
Qasper is presented, a dataset of 5049 questions over 1585 Natural Language Processing papers that is designed to facilitate document-grounded, information-seeking QA, and finds that existing models that do well on other QA tasks do not perform well on answering these questions.
Retrieval Data Augmentation Informed by Downstream Question Answering Performance
TLDR
This work identifies relevant passages based on whether they are useful for a trained QA model to arrive at the correct answers, and develops a search process guided by the QAmodel’s loss that generalizes better to the end QA task.
ConditionalQA: A Complex Reading Comprehension Dataset with Conditional Answers
TLDR
It is shown that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions, and will motivate further research in answering complex questions over long documents.
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
TLDR
The largest survey of the field to date of question answering and reading comprehension, providing an overview of the various formats and domains of the current resources, and highlighting the current lacunae for future work.
Mitigating False-Negative Contexts in Multi-document Question Answering with Retrieval Marginalization
TLDR
A new parameterization of set-valued retrieval that handles unanswerable queries is developed, and it is shown that marginalizing over this set during training allows a model to mitigate false negatives in supporting evidence annotations.
English Machine Reading Comprehension Datasets: A Survey
TLDR
This paper surveys 54 English Machine Reading Comprehension datasets and reveals that Wikipedia is by far the most common data source and that there is a relative lack of why, when, and where questions across datasets.
Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
TLDR
This work proposes to leverage semi-structured tables, and automatically generate at scale question-paragraph pairs, where answering the question requires reasoning over multiple facts in the paragraph, and shows that the model, PReasM, substantially outperforms T5, a popular pre-trained encoder-decoder model.
Break, Perturb, Build: Automatic Perturbation of Reasoning Paths Through Question Decomposition
TLDR
This work introduces the “Break, Perturb, Build” (BPB) framework for automatic reasoning-oriented perturbation of question-answer pairs, and demonstrates the effectiveness of BPB by creating evaluation sets for three reading comprehension benchmarks, generating thousands of high-quality examples without human intervention.
Interactive Machine Comprehension with Dynamic Knowledge Graphs
TLDR
This work hypothesizes that graph representations are good inductive biases, which can serve as an agent’s memory mechanism in iMRC tasks and describes methods that dynamically build and update these graphs during information gathering, as well as neural models to encode graph representations in RL agents.
Learning to Solve Complex Tasks by Talking to Agents
TLDR
This work proposes a new benchmark called COMMAQA that contains three kinds of complex reasoning tasks that are designed to be solved by “talking” to four agents with different capabilities and hopes it serves as a novel benchmark to enable the development of “green” AI systems that build upon existing agents.
...
1
2
...

References

SHOWING 1-10 OF 28 REFERENCES
Constructing Datasets for Multi-hop Reading Comprehension Across Documents
TLDR
A novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods, in which a model learns to seek and combine evidence — effectively performing multihop, alias multi-step, inference.
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
TLDR
A new reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs, and presents a new model that combines reading comprehension methods with simple numerical reasoning to achieve 51% F1.
Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning
TLDR
This work presents a new crowdsourced dataset containing more than 24K span-selection questions that require resolving coreference among entities in over 4.7K English paragraphs from Wikipedia, and shows that state-of-the-art reading comprehension models perform significantly worse than humans on this benchmark.
Interactive Machine Comprehension with Information Seeking Agents
TLDR
This paper “occlude” the majority of a document’s text and add context-sensitive commands that reveal “glimpses” of the hidden text to a model, and believes that this setting can contribute in scaling models to web-level QA scenarios.
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
TLDR
It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers.
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TLDR
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).
DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension
TLDR
DuoRC is proposed, a novel dataset for Reading Comprehension (RC) that motivates several new challenges for neural approaches in language understanding beyond those offered by existing RC datasets and could complement other RC datasets to explore novel neural approaches for studying language understanding.
What’s Missing: A Knowledge Gap Guided Approach for Multi-hop Question Answering
TLDR
A novel approach is developed that explicitly identifies the knowledge gap between a key span in the provided knowledge and the answer choices and learns to fill this gap by determining the relationship between the span and an answer choice, based on retrieved knowledge targeting this gap.
Quasar: Datasets for Question Answering by Search and Reading
We present two new large-scale datasets aimed at evaluating systems designed to comprehend a natural language query and extract its answer from a large corpus of text. The Quasar-S dataset consists
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
TLDR
A quantitative analysis of the data quality and example-level qualitative linguistic analyses of observed language phenomena that would not be found in English-only corpora are presented.
...
1
2
3
...