PIQA: Reasoning about Physical Commonsense in Natural Language

@inproceedings{Bisk2020PIQARA,
  title={PIQA: Reasoning about Physical Commonsense in Natural Language},
  author={Yonatan Bisk and Rowan Zellers and Ronan Le Bras and Jianfeng Gao and Yejin Choi},
  booktitle={AAAI},
  year={2020}
}
To apply eyeshadow without a brush, should I use a cotton swab or a toothpick? Questions requiring this kind of physical commonsense pose a challenge to today's natural language understanding systems. While recent pretrained models (such as BERT) have made progress on question answering over more abstract domains – such as news articles and encyclopedia entries, where text is plentiful – in more physical domains, text is inherently limited due to reporting bias. Can AI systems learn to reliably… 

Figures and Tables from this paper

RiddleSense: Reasoning about Riddle Questions Featuring Linguistic Creativity and Commonsense Knowledge

TLDR
RIDDLESENSE1, a new multiple-choice question answering task, is presented, which comes with the first large dataset (5.7k examples) for answering riddlestyle commonsense questions and it is pointed out that there is a large gap between the bestsupervised model and human performance.

COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences

TLDR
This work introduces a new commonsense reasoning benchmark dataset comprising natural language true/false statements, with each sample paired with its complementary counterpart, resulting in 4k sentence pairs, and proposes a pairwise accuracy metric to reliably measure an agent’s ability to perform Commonsense reasoning over a given situation.

Shortcutted Commonsense: Data Spuriousness in Deep Learning of Commonsense Reasoning

TLDR
A study on different prominent benchmarks that involve commonsense reasoning, along a number of key stress experiments, thus seeking to gain insight on whether the models are learning transferable generalizations intrinsic to the problem at stake or just taking advantage of incidental shortcuts in the data items.

Knowledge-driven Self-supervision for Zero-shot Commonsense Question Answering

TLDR
A novel neuro-symbolic framework for zero-shot question answering across commonsense tasks is proposed and it is shown that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks.

Knowledge-driven Data Construction for Zero-shot Evaluation in Commonsense Question Answering

TLDR
A novel neuro-symbolic framework for zero-shot question answering across commonsense tasks is proposed and it is shown that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks.

Prompting Contrastive Explanations for Commonsense Reasoning Tasks

TLDR
Inspired by the contrastive nature of human explanations, large pretrained language models are used to complete explanation prompts which contrast alternatives according to the key attribute(s) required to justify the correct answer.

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

TLDR
This work introduces Cross-lingual Choice of Plausible Alternatives (XCOPA), a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages, revealing that current methods based on multilingual pretraining and zero-shot fine-tuning transfer suffer from the curse of multilinguality and fall short of performance in monolingual settings by a large margin.

RiddleSense: Answering Riddle Questions as Commonsense Reasoning

TLDR
RIDDLESENSE1 is proposed, a novel multiple-choice question answering challenge for benchmarking higher-order commonsense reasoning models, which is the first large dataset for riddle-style commonsense question answering, where the distractors are crowdsourced from human annotators.

Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering

TLDR
This paper performs a survey of recent commonsense QA methods and provides a systematic analysis of popular knowledge resources and knowledge-integration methods, across benchmarks from multiple commonsense datasets, and shows that attention-based injection seems to be a preferable choice for knowledge integration.

Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding

TLDR
The empirical results show that while large LMs can achieve high end performance, they struggle to support their predictions with valid supporting evidence, and this paper introduces Tiered Reasoning for Intuitive Physics ( TRIP), a novel commonsense reasoning dataset with dense annotations that enable multi-tiered evaluation of machines’ reasoning process.
...

References

SHOWING 1-10 OF 35 REFERENCES

Verb Physics: Relative Physical Knowledge of Actions and Objects

TLDR
An approach to infer relative physical knowledge of actions and objects along five dimensions (e.g., size, weight, and strength) from unstructured natural language text is presented.

Social IQA: Commonsense Reasoning about Social Interactions

TLDR
It is established that Social IQa, the first large-scale benchmark for commonsense reasoning about social situations, is challenging for existing question-answering models based on pretrained language models, compared to human performance (>20% gap).

SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference

TLDR
This paper introduces the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning, and proposes Adversarial Filtering (AF), a novel procedure that constructs a de-biased dataset by iteratively training an ensemble of stylistic classifiers, and using them to filter the data.

Annotation Artifacts in Natural Language Inference Data

TLDR
It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.

From Recognition to Cognition: Visual Commonsense Reasoning

TLDR
To move towards cognition-level understanding, a new reasoning engine is presented, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning.

HellaSwag: Can a Machine Really Finish Your Sentence?

TLDR
The construction of HellaSwag, a new challenge dataset, and its resulting difficulty, sheds light on the inner workings of deep pretrained models, and suggests a new path forward for NLP research, in which benchmarks co-evolve with the evolving state-of-the-art in an adversarial way, so as to present ever-harder challenges.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

TLDR
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale

TLDR
This work introduces WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset, and establishes new state-of-the-art results on five related benchmarks.

Language Models as Knowledge Bases?

TLDR
An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge.

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction

TLDR
This investigation reveals promising results when implicit knowledge from deep pre-trained language models is transferred to generate explicit knowledge in commonsense knowledge graphs, and suggests that using generative commonsense models for automatic commonsense KB completion could soon be a plausible alternative to extractive methods.