ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning
- Maarten Sap, Ronan Le Bras, Yejin Choi
- Computer ScienceAAAI Conference on Artificial Intelligence
- 2019
Experimental results demonstrate that multitask models that incorporate the hierarchical structure of if-then relation types lead to more accurate inference compared to models trained in isolation, as measured by both automatic and human evaluation.
Social IQA: Commonsense Reasoning about Social Interactions
- Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, Yejin Choi
- Computer ScienceConference on Empirical Methods in Natural…
- 2019
It is established that Social IQa, the first large-scale benchmark for commonsense reasoning about social situations, is challenging for existing question-answering models based on pretrained language models, compared to human performance (>20% gap).
WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale
- Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi
- Computer ScienceAAAI Conference on Artificial Intelligence
- 24 July 2019
This work introduces WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset, and establishes new state-of-the-art results on five related benchmarks.
PIQA: Reasoning about Physical Commonsense in Natural Language
- Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, Yejin Choi
- Computer ScienceAAAI Conference on Artificial Intelligence
- 26 November 2019
The task of physical commonsense reasoning and a corresponding benchmark dataset Physical Interaction: Question Answering or PIQA are introduced and analysis about the dimensions of knowledge that existing models lack are provided, which offers significant opportunities for future research.
COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
- Jena D. Hwang, Chandra Bhagavatula, Yejin Choi
- Computer ScienceAAAI Conference on Artificial Intelligence
- 12 October 2020
It is proposed that manually constructed CSKGs will never achieve the coverage necessary to be applicable in all situations encountered by NLP agents, and a new evaluation framework for testing the utility of KGs based on how effectively implicit knowledge representations can be learned from them is proposed.
Abductive Commonsense Reasoning
- Chandra Bhagavatula, Ronan Le Bras, Yejin Choi
- Computer ScienceInternational Conference on Learning…
- 15 August 2019
This study introduces a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations, and conceptualizes two new tasks -- Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and Abduction NLG: a conditional generation task for explaining given observations in natural language.
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning
- Lifu Huang, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi
- Computer ScienceConference on Empirical Methods in Natural…
- 31 August 2019
This paper introduces Cosmos QA, a large-scale dataset of 35,600 problems that require commonsense-based reading comprehension, formulated as multiple-choice questions, and proposes a new architecture that improves over the competitive baselines.
Unsupervised Commonsense Question Answering with Self-Talk
- Vered Shwartz, Peter West, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi
- Computer ScienceConference on Empirical Methods in Natural…
- 11 April 2020
An unsupervised framework based on self-talk as a novel alternative to multiple-choice commonsense tasks, inspired by inquiry-based discovery learning, which improves performance on several benchmarks and competes with models that obtain knowledge from external KBs.
Adversarial Filters of Dataset Biases
- Ronan Le Bras, Swabha Swayamdipta, Yejin Choi
- Computer ScienceInternational Conference on Machine Learning
- 10 February 2020
This work presents extensive supporting evidence that AFLite is broadly applicable for reduction of measurable dataset biases, and that models trained on the filtered datasets yield better generalization to out-of-distribution tasks.
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
- A. Srivastava, Abhinav Rastogi, Uri Shaham
- Computer ScienceArXiv
- 9 June 2022
Evaluation of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters finds that model performance and calibration both improve with scale, but are poor in absolute terms.
...
...