WIQA: A dataset for “What if...” reasoning over procedural text

@article{Tandon2019WIQAAD,
  title={WIQA: A dataset for “What if...” reasoning over procedural text},
  author={Niket Tandon and Bhavana Dalvi and Keisuke Sakaguchi and Antoine Bosselut and P. Clark},
  journal={ArXiv},
  year={2019},
  volume={abs/1909.04739}
}
We introduce WIQA, the first large-scale dataset of "What if..." questions over procedural text. [...] Key Result We find that state-of-the-art models achieve 73.8% accuracy, well below the human performance of 96.3%. We analyze the challenges, in particular tracking chains of influences, and present the dataset as an open challenge to the community.Expand
CLEVR_HYP: A Challenge Dataset and Baselines for Visual Question Answering with Hypothetical Actions over Images
TLDR
This paper formulate a vision-language question answering task based on the CLEVR dataset, modify the best existing VQA methods and propose baseline solvers for this task, and motivate the development of better vision- language models by providing insights about the capability of diverse architectures to perform joint reasoning over image-text modality. Expand
Enhancing Multiple-Choice Question Answering with Causal Knowledge
TLDR
Novel strategies for the representation of causal knowledge are presented and the empirical results demonstrate the efficacy of augmenting pretrained models with external causal knowledge for multiple-choice causal question answering. Expand
Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models
TLDR
This structured survey provides an overview of the evolving research area by categorising reported weaknesses in models and datasets and the methods proposed to reveal and alleviate those weaknesses for the English language. Expand
QED: A Framework and Dataset for Explanations in Question Answering
TLDR
A large user study is described showing that the presence of QED explanations significantly improves the ability of untrained raters to spot errors made by a strong neural QA baseline. Expand
EIGEN: Event Influence GENeration using Pre-trained Language Models
TLDR
This paper presents EIGEN - a method to leverage pre-trained language models to generate event influences conditioned on a context, nature of their influence, and the distance in a reasoning chain, and derives a new dataset for research and evaluation of methods for event influence generation. Expand
CURIE: An Iterative Querying Approach for Reasoning About Situations
TLDR
It is shown that st graphs generated by CURIE improve a situational reasoning end task (WIQA-QA) by 3 points on accuracy by simply augmenting their input with the authors' generated situational graphs, especially for a hard subset that requires background knowledge and multi-hop reasoning. Expand
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
TLDR
This paper introduces CROSSFIT, a task setup for studying cross-task few-shot learning ability, which standardizes seen/unseen task splits, data access during different learning stages, and the evaluation protocols, and presents NLP Few-shot Gym, a repository of 160 few- Shots tasks, covering diverse task categories and applications, and converted to a unified text-to-text format. Expand
Improving Neural Model Performance through Natural Language Feedback on Their Explanations
TLDR
This work introduces MERCURIE, an interactive system that refines its explanations for a given reasoning task by getting human feedback in natural language, and generates graphs that have 40% fewer inconsistencies as compared with the off-the-shelf system. Expand
DomiKnowS: A Library for Integration of Symbolic Domain Knowledge in Deep Learning
We demonstrate a library for the integration of domain knowledge in deep learning architectures. Using this library, the structure of the data is expressed symbolically via graph declarations and theExpand
Thinking Like a Skeptic: Defeasible Inference in Natural Language
TLDR
From Defeasible NLI, both a classification and generation task for defeasible inference are developed, and it is demonstrated that the generation task is much more challenging. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 15 REFERENCES
Reasoning about Actions and State Changes by Injecting Commonsense Knowledge
TLDR
This paper shows how the predicted effects of actions in the context of a paragraph can be improved in two ways: by incorporating global, commonsense constraints (e.g., a non-existent entity cannot be destroyed), and by biasing reading with preferences from large-scale corpora. Expand
Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension
TLDR
A new dataset and models for comprehending paragraphs about processes, an important genre of text describing a dynamic world, are presented and two new neural models that exploit alternative mechanisms for state prediction are introduced, in particular using LSTM input encoding and span prediction. Expand
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
TLDR
This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Tracking the World State with Recurrent Entity Networks
TLDR
The EntNet sets a new state-of-the-art on the bAbI tasks, and is the first method to solve all the tasks in the 10k training examples setting, and can generalize past its training horizon. Expand
Globally Coherent Text Generation with Neural Checklist Models
TLDR
The neural checklist model is presented, a recurrent neural network that models global coherence by storing and updating an agenda of text strings which should be mentioned somewhere in the output, and demonstrates high coherence with greatly improved semantic coverage of the agenda. Expand
A Structured Learning Approach to Temporal Relation Extraction
TLDR
It is suggested that it is important to take dependencies into account while learning to identify temporal relations between events and a structured learning approach is proposed to address this challenge. Expand
A Decomposable Attention Model for Natural Language Inference
We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subproblems that can be solved separately, thus making it triviallyExpand
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
TLDR
The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand
A decision-theoretic generalization of on-line learning and an application to boosting
TLDR
The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and the multiplicative weightupdate Littlestone Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. Expand
...
1
2
...