Unsupervised Commonsense Question Answering with Self-Talk

Natural language understanding involves reading between the lines with implicit background knowledge. Current systems either rely on pre-trained language models as the sole implicit source of world knowledge, or resort to external knowledge bases (KBs) to incorporate additional relevant knowledge. We propose an unsupervised framework based on \emph{self-talk} as a novel alternative to multiple-choice commonsense tasks. Inspired by inquiry-based discovery learning (Bruner, 1961), our approach… 

Comprehension Based Question Answering using Bloom’s Taxonomy

This work uses Bloom’s Taxonomy to provide proximal context that helps the model answer questions by being relevant to those questions, and shows targeting context in this manner improves performance across 4 popular common sense question answer datasets.

Answer-level Calibration for Free-form Multiple Choice Question Answering

This work presents ALC (Answer-Level Calibration), where the main suggestion is to model context-independent biases in terms of the probability of a choice without the associated context and to subsequently remove it using an unsupervised estimate of similarity with the full context.

Improving Unsupervised Commonsense Reasoning Using Knowledge-Enabled Natural Language Inference

This work shows the effectiveness of using a common framework, Natural Language Inference (NLI), to solve diverse commonsense reasoning tasks, by leveraging transfer learning from large NLI datasets, and injecting crucial knowledge from commonsense sources such as ATOMIC 2020 and ConceptNet.

A Systematic Investigation of Commonsense Understanding in Large Language Models

It is found that the impressive zeroshot performance of large language models is mostly due to existence of dataset bias in the authors' benchmarks, and that leveraging explicit commonsense knowledge does not yield substantial improvement.

Do Language Models Learn Commonsense Knowledge?

Language models (LMs) trained on large amounts of data (e.g., Brown et al., 2020; Patwary et al., 2021) have shown impressive performance on many NLP tasks under the zero-shot and few-shot setup.

Think Before You Speak: Using Self-talk to Generate Implicit Commonsense Knowledge for Response Generation

This paper presents a self-talk approach that first generates the implicit commonsense knowledge and then generates response by referencing the externalized knowledge, all using one generative model.

Prompting Contrastive Explanations for Commonsense Reasoning Tasks

Inspired by the contrastive nature of human explanations, large pretrained language models are used to complete explanation prompts which contrast alternatives according to the key attribute(s) required to justify the correct answer.

minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models

The minicons library is described and applied to two motivating case studies: One focusing on the learning dynamics of the BERT architecture on relative grammatical judgments, and the other on benchmarking 23 different LMs on zero-shot abductive reasoning.



XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Interpretation of Natural Language Rules in Conversational Machine Reading

This paper formalise this task and develops a crowd-sourcing strategy to collect 37k task instances based on real-world rules and crowd-generated questions and scenarios to assess its difficulty by evaluating the performance of rule-based and machine-learning baselines.