WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge

  title={WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge},
  author={Hongming Zhang and Xinran Zhao and Yangqiu Song},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
In this paper, we present the first comprehensive categorization of essential commonsense knowledge for answering the Winograd Schema Challenge (WSC). For each of the questions, we invite annotators to first provide reasons for making correct decisions and then categorize them into six major knowledge categories. By doing so, we better understand the limitation of existing methods (i.e., what kind of knowledge cannot be effectively represented or inferred with existing methods) and shed some… 

WinoLogic: A Zero-Shot Logic-based Diagnostic Dataset for Winograd Schema Challenge

A logic-based framework that focuses on high-quality commonsense knowledge, which identifies and collects formal knowledge formulas verified by theorem provers and translates such formulas into natural language sentences and proposes a new dataset named WinoLogic with these sentences.

CIKQA: Learning Commonsense Inference with a Unified Knowledge-in-the-loop QA Paradigm

This work focuses on investigating models’ commonsense inference capabilities from two per-spectives: (1) Whether models can know if the knowledge they have is enough to solve the task; (2) whether models can learn commonsens inference capabilities, that generalize across commonsense tasks.

Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema

It is suggested that the apparent progress on WS may not necessarily reflect progress in commonsense reasoning, and the observed progress is mostly due to the use of supervision in training WS models, which is not likely to successfully support all the required Commonsense reasoning skills and knowledge.

ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning

This work presents ExplaGraphs, a new generative and structured commonsense-reasoning task (and an associated dataset) of explanation graph generation for stance prediction, and proposes a multi-level evaluation framework that check for the structural and semantic correctness of the generated graphs and their degree of match with ground-truth graphs.

Dimensions of Commonsense Knowledge

Improving Unsupervised Commonsense Reasoning Using Knowledge-Enabled Natural Language Inference

This work shows the effectiveness of using a common framework, Natural Language Inference (NLI), to solve diverse commonsense reasoning tasks, by leveraging transfer learning from large NLI datasets, and injecting crucial knowledge from commonsense sources such as ATOMIC 2020 and ConceptNet.

Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?

This work formalizes ways in which ungrounded language models appear to be fundamentally limited in their ability to “understand”, and suggests that assertions in code or language do not provide sufficient signal to fully emulate semantic representations.

Prompting Contrastive Explanations for Commonsense Reasoning Tasks

Inspired by the contrastive nature of human explanations, large pretrained language models are used to complete explanation prompts which contrast alternatives according to the key attribute(s) required to justify the correct answer.

Teach Me to Explain: A Review of Datasets for Explainable NLP

This review identifies three predominant classes of explanations (highlights, free-text, and structured), organize the literature on annotating each type, point to what has been learned to date, and give recommendations for collecting EXNLP datasets in the future.

On the Diversity and Limits of Human Explanations

Inspired by prior work in psychology and cognitive sciences, existing human explanations in NLP are group into three categories: proximal mechanism, evidence, and procedure, which differ in nature and have implications for the resultant explanations.



A Simple Method for Commonsense Reasoning

Key to this method is the use of language models, trained on a massive amount of unlabled data, to score multiple choice questions posed by commonsense reasoning tests, which outperform previous state-of-the-art methods by a large margin.

A Surprisingly Robust Trick for the Winograd Schema Challenge

This paper shows that the performance of three language models on WSC273 strongly improves when fine-tuned on a similar pronoun disambiguation problem dataset (denoted WSCR), and generates a large unsupervised WSC-like dataset.

WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale

This work introduces WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset, and establishes new state-of-the-art results on five related benchmarks.

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

This work presents CommonsenseQA: a challenging new dataset for commonsense question answering, which extracts from ConceptNet multiple target concepts that have the same semantic relation to a single source concept.

KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning

This paper proposes a textual inference framework for answering commonsense questions, which effectively utilizes external, structured commonsense knowledge graphs to perform explainable inferences.

Combing Context and Commonsense Knowledge Through Neural Networks for Solving Winograd Schema Problems

A general framework to combine context and commonsense knowledge for solving the Winograd Schema (WS) and Pronoun Disambiguation Problems (PDP) and two methods to solve the WS and PDP problems are proposed.

Yuanfudao at SemEval-2018 Task 11: Three-way Attention and Relational Knowledge for Commonsense Machine Comprehension

This paper describes the system for SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge, which uses Three-way Attentive Networks (TriAN) to model interactions between the passage, question and answers and augment the input with relation embedding from the graph of general knowledge ConceptNet.

Commonsense Knowledge Aware Conversation Generation with Graph Attention

This is the first attempt that uses large-scale commonsense knowledge in conversation generation, and unlike existing models that use knowledge triples (entities) separately and independently, this model treats each knowledge graph as a whole, which encodes more structured, connected semantic information in the graphs.

ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning

Experimental results demonstrate that multitask models that incorporate the hierarchical structure of if-then relation types lead to more accurate inference compared to models trained in isolation, as measured by both automatic and human evaluation.

ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

A new version of the linked open data resource ConceptNet is presented that is particularly well suited to be used with modern NLP techniques such as word embeddings, with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies.