QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions

@article{Tafjord2019QuaRTzAO,
  title={QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions},
  author={Oyvind Tafjord and Matt Gardner and Kevin Lin and Peter Clark},
  journal={ArXiv},
  year={2019},
  volume={abs/1909.03553}
}
We introduce the first open-domain dataset, called QuaRTz, for reasoning about textual qualitative relationships. QuaRTz contains general qualitative statements, e.g., “A sunscreen with a higher SPF protects the skin longer.”, twinned with 3864 crowdsourced situated questions, e.g., “Billy is wearing sunscreen with a lower SPF than Lucy. Who will be best protected from the sun?”, plus annotations of the properties being compared. Unlike previous datasets, the general knowledge is textual and… 

Figures and Tables from this paper

UKP-SQUARE: An Online Platform for Question Answering Research
TLDR
UKP-SQuARE is an extensible online QA platform for researchers which allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated behavioural tests.
Prediction or Comparison: Toward Interpretable Qualitative Reasoning
TLDR
This work categorizes qualitative reasoning tasks into two types: prediction and comparison, and adopts neural network modules trained in an end-to-end manner to simulate the two reasoning processes.
Evaluating Models’ Local Decision Boundaries via Contrast Sets
TLDR
A more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data, and recommends that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets.
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension
TLDR
The largest survey of the field to date of question answering and reading comprehension, providing an overview of the various formats and domains of the current resources, and highlighting the current lacunae for future work.
Competency Problems: On Finding and Removing Artifacts in Language Data
TLDR
This work argues that for complex language understanding tasks, all simple feature correlations are spurious, and formalizes this notion into a class of problems which are called competency problems, and gives a simple statistical test for dataset artifacts that is used to show more subtle biases.
CURIE: An Iterative Querying Approach for Reasoning About Situations
TLDR
CURIE is proposed, a method to iteratively build a graph of relevant consequences explicitly in a structured situational graph (st graph) using natural language queries over a finetuned language model and it is shown that these improvements mainly come from a hard subset of the data, that requires background knowledge and multi-hop reasoning.
Investigating the Benefits of Free-Form Rationales
TLDR
This work presents human studies which show that ECQA rationales indeed provide additional background information to understand a decision, while over 88% of CoS-E rationales do not, and investigates the utility of rationales as an additional source of supervision by varying the quantity and quality ofrationales during training.
ILDAE: Instance-Level Difficulty Analysis of Evaluation Data
TLDR
Instance-Level Difficulty Analysis of Evaluation data (ILDAE) is conducted in a large-scale setup of 23 datasets and its five novel applications are demonstrated, such as conducting efficient-yet-accurate evaluations with fewer instances saving computational cost and time.
Natural Language QA Approaches using Reasoning with External Knowledge
TLDR
A survey of the recent work on the traditional fields of knowledge representation and reasoning and the field of NL understanding and NLQA is presented to help establish a bridge between multiple fields of AI.
Probabilistic Graph Reasoning for Natural Proof Generation
TLDR
This paper proposes PROBR, a novel approach for joint answer prediction and proof generation via an induced graphical model that defines a joint probabilistic distribution over all possible proof graphs and answers via an inducing graphical model.
...
...

References

SHOWING 1-10 OF 15 REFERENCES
QuaRel: A Dataset and Models for Answering Questions about Qualitative Relationships
TLDR
This work makes inroads into answering complex, qualitative questions that require reasoning, and scaling to new relationships at low cost, with two novel models for this task built as extensions of type-constrained semantic parsing.
Semantic Parsing on Freebase from Question-Answer Pairs
TLDR
This paper trains a semantic parser that scales up to Freebase and outperforms their state-of-the-art parser on the dataset of Cai and Yates (2013), despite not having annotated logical forms.
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
TLDR
It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.
Neural Semantic Parsing with Type Constraints for Semi-Structured Tables
TLDR
A new semantic parsing model for answering compositional questions on semi-structured Wikipedia tables with a state-of-the-art accuracy and type constraints and entity linking are valuable components to incorporate in neural semantic parsers.
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
TLDR
A new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI.
Scaling up Linguistic Processing of Qualitative Processes
TLDR
This paper describes how it has built on and improved representations used in prior work to scale up to chapter-length texts, and to extract complete type level rather than instance-level models.
RACE: Large-scale ReAding Comprehension Dataset From Examinations
TLDR
The proportion of questions that requires reasoning is much larger in RACE than that in other benchmark datasets for reading comprehension, and there is a significant gap between the performance of the state-of-the-art models and the ceiling human performance.
Simple and Effective Multi-Paragraph Reading Comprehension
TLDR
It is shown that it is possible to significantly improve performance by using a modified training scheme that teaches the model to ignore non-answer containing paragraphs, which involves sampling multiple paragraphs from each document, and using an objective function that requires themodel to produce globally correct output.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
AllenNLP: A Deep Semantic Natural Language Processing Platform
TLDR
AllenNLP is described, a library for applying deep learning methods to NLP research that addresses issues with easy-to-use command-line tools, declarative configuration-driven experiments, and modular NLP abstractions.
...
...