PIQA: Reasoning about Physical Commonsense in Natural Language

@inproceedings{Bisk2020PIQARA,
  title={PIQA: Reasoning about Physical Commonsense in Natural Language},
  author={Yonatan Bisk and Rowan Zellers and Ronan Le Bras and Jianfeng Gao and Yejin Choi},
  booktitle={AAAI},
  year={2020}
}
To apply eyeshadow without a brush, should I use a cotton swab or a toothpick? Questions requiring this kind of physical commonsense pose a challenge to today's natural language understanding systems. While recent pretrained models (such as BERT) have made progress on question answering over more abstract domains – such as news articles and encyclopedia entries, where text is plentiful – in more physical domains, text is inherently limited due to reporting bias. Can AI systems learn to reliably… Expand
COM2SENSE: A Commonsense Reasoning Benchmark with Complementary Sentences
TLDR
This work introduces a new commonsense reasoning benchmark dataset comprising natural language true/false statements, with each sample paired with its complementary counterpart, resulting in 4k sentence pairs, and proposes a pairwise accuracy metric to reliably measure an agent’s ability to perform Commonsense reasoning over a given situation. Expand
Knowledge-driven Self-supervision for Zero-shot Commonsense Question Answering
TLDR
A novel neuro-symbolic framework for zero-shot question answering across commonsense tasks is proposed and it is shown that, while an individual knowledge graph is better suited for specific tasks, a global knowledge graph brings consistent gains across different tasks. Expand
Prompting Contrastive Explanations for Commonsense Reasoning Tasks
TLDR
Inspired by the contrastive nature of human explanations, large pretrained language models are used to complete explanation prompts which contrast alternatives according to the key attribute(s) required to justify the correct answer. Expand
RiddleSense: Answering Riddle Questions as Commonsense Reasoning
TLDR
RIDDLESENSE1 is proposed, a novel multiple-choice question answering challenge for benchmarking higher-order commonsense reasoning models, which is the first large dataset for riddle-style commonsense question answering, where the distractors are crowdsourced from human annotators. Expand
Do Fine-tuned Commonsense Language Models Really Generalize?
TLDR
Clear evidence is found that fine-tuned commonsense language models still do not generalize well, even with moderate changes to the experimental setup, and may, in fact, be susceptible to dataset bias. Expand
COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
TLDR
It is proposed that manually constructed CSKGs will never achieve the coverage necessary to be applicable in all situations encountered by NLP agents, and a new evaluation framework for testing the utility of KGs based on how effectively implicit knowledge representations can be learned from them is proposed. Expand
Tiered Reasoning for Intuitive Physics: Toward Verifiable Commonsense Language Understanding
  • Shane Storks, Qiaozi Gao, Yichi Zhang, Joyce Chai
  • Computer Science
  • ArXiv
  • 2021
Large-scale, pre-trained language models (LMs) have achieved human-level performance on a breadth of language understanding tasks. However, evaluations only based on end task performance shed littleExpand
How Additional Knowledge can Improve Natural Language Commonsense Question Answering
TLDR
This work first categorizes external knowledge sources, and shows performance does improve on using such sources, then explores three different strategies for knowledge incorporation and four different models for question-answering using external commonsense knowledge. Expand
PROST: Physical Reasoning about Objects through Space and Time
TLDR
It is demonstrated that state-of-the-art pretrained models are inadequate at physical reasoning: they are influenced by the order in which answer options are presented to them, they struggle when the superlative in a question is inverted, and increasing the amount of pretraining data and parameters only yields minimal improvements. Expand
Can RoBERTa Reason? A Systematic Approach to Probe Logical Reasoning in Language Models
  • 2020
Humans can map natural language into a logical representation that is robust to linguistic variations and useful for reasoning. While pre-trained language models (LM) have dramatically improvedExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 39 REFERENCES
Verb Physics: Relative Physical Knowledge of Actions and Objects
TLDR
An approach to infer relative physical knowledge of actions and objects along five dimensions (e.g., size, weight, and strength) from unstructured natural language text is presented. Expand
Social IQA: Commonsense Reasoning about Social Interactions
TLDR
Social IQa is introduced, the first largescale benchmark for commonsense reasoning about social situations, using a new framework that mitigates stylistic artifacts in incorrect answers by asking workers to provide the right answer to a different but related question. Expand
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
TLDR
This paper introduces the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning, and proposes Adversarial Filtering (AF), a novel procedure that constructs a de-biased dataset by iteratively training an ensemble of stylistic classifiers, and using them to filter the data. Expand
Annotation Artifacts in Natural Language Inference Data
TLDR
It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes. Expand
From Recognition to Cognition: Visual Commonsense Reasoning
TLDR
To move towards cognition-level understanding, a new reasoning engine is presented, Recognition to Cognition Networks (R2C), that models the necessary layered inferences for grounding, contextualization, and reasoning. Expand
HellaSwag: Can a Machine Really Finish Your Sentence?
TLDR
The construction of HellaSwag, a new challenge dataset, and its resulting difficulty, sheds light on the inner workings of deep pretrained models, and suggests a new path forward for NLP research, in which benchmarks co-evolve with the evolving state-of-the-art in an adversarial way, so as to present ever-harder challenges. Expand
SQuAD: 100,000+ Questions for Machine Comprehension of Text
TLDR
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%). Expand
WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale
TLDR
This work introduces WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset, and establishes new state-of-the-art results on five related benchmarks. Expand
Language Models as Knowledge Bases?
TLDR
An in-depth analysis of the relational knowledge already present (without fine-tuning) in a wide range of state-of-the-art pretrained language models finds that BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge. Expand
COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
TLDR
This investigation reveals promising results when implicit knowledge from deep pre-trained language models is transferred to generate explicit knowledge in commonsense knowledge graphs, and suggests that using generative commonsense models for automatic commonsense KB completion could soon be a plausible alternative to extractive methods. Expand
...
1
2
3
4
...