Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning

  title={Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning},
  author={Lifu Huang and Ronan Le Bras and Chandra Bhagavatula and Yejin Choi},
Understanding narratives requires reading between the lines, which in turn, requires interpreting the likely causes and effects of events, even when they are not mentioned explicitly. [] Key Method To establish baseline performances on Cosmos QA, we experiment with several state-of-the-art neural architectures for reading comprehension, and also propose a new architecture that improves over the competitive baselines. Experimental results demonstrate a significant gap between machine (68.4%) and human…

Figures and Tables from this paper

Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios

This study frames this task by asking multiple questions with the same set of possible endings as candidate answers, given a short story text, and discovers that even current strong pretrained language models struggle to answer the questions consistently.

Commonsense Evidence Generation and Injection in Reading Comprehension

A Commonsense Evidence Generation and Injection framework in reading comprehension, named CEGI, which injects two kinds of auxiliary commonsense evidence into comprehensive reading to equip the machine with the ability of rational thinking.

On Making Reading Comprehension More Comprehensive

This work justifies a question answering approach to reading comprehension and describes the various kinds of questions one might use to more fully test a system’s comprehension of a passage, moving beyond questions that only probe local predicate-argument structures.

elBERto: Self-supervised Commonsense Learning for Question Answering

The proposed elBERto framework achieves substantial improvements on out-of-paragraph and no-effect questions where simple lexical similarity comparison does not help, indicating that it successfully learns commonsense and is able to leverage it when given dynamic context.

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

This paper proposes a joint framework that unifies these two tasks, sharing the same encoder to extract the common and task-invariant features with different decoders to learn task-specific features, and augments the Transformer architecture with a memory updater, designed to selectively store and update the history dialog information so as to support downstream tasks.

COMMONGEN: Towards Generative Commonsense Reasoning via A Constrained Text Generation Challenge

A constrained natural language generation (NLG) dataset, named COMMONGEN, is presented to explicitly challenge machines in generative commonsense reasoning and shows that there is still a large gap between the current state-of-the-art pre-trained model, UniLM, and human performance.

What Makes Reading Comprehension Questions Difficult?

Crowdsource multiple-choice reading comprehension questions for passages taken from seven qualitatively distinct sources suggest that selecting a diverse set of passages can help ensure a diverse range of question types, but that passage difficulty need not be a priority.

XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning

This work introduces Cross-lingual Choice of Plausible Alternatives (XCOPA), a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages, revealing that current methods based on multilingual pretraining and zero-shot fine-tuning transfer suffer from the curse of multilinguality and fall short of performance in monolingual settings by a large margin.

CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning

A constrained text generation task, CommonGen associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning, and demonstrates that the learned generative Commonsense reasoning capability can be transferred to improve downstream tasks such as CommonsenseQA by generating additional context.

CommonGen: A Constrained Text Generation Dataset Towards Generative Commonsense Reasoning

This work presents CommonGen: a challenging dataset for testing generative commonsense reasoning with a constrained text generation task, and provides high-quality rationales behind the reasoning process for the development and test sets from the human annotators.



The NarrativeQA Reading Comprehension Challenge

A new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts are presented, designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.

MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.

CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge

This work presents CommonsenseQA: a challenging new dataset for commonsense question answering, which extracts from ConceptNet multiple target concepts that have the same semantic relation to a single source concept.

DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension

Experimental results on the DREAM data set show the effectiveness of dialogue structure and general world knowledge, the first dialogue-based multiple-choice reading comprehension data set to focus on in-depth multi-turn multi-party dialogue understanding.

MCScript: A Novel Dataset for Assessing Machine Comprehension Using Script Knowledge

A large dataset of narrative texts and questions about these texts, intended to be used in a machine comprehension task that requires reasoning using commonsense knowledge, and shows that the mode of data collection via crowdsourcing results in a substantial amount of inference questions.

Annotation Artifacts in Natural Language Inference Data

It is shown that a simple text categorization model can correctly classify the hypothesis alone in about 67% of SNLI and 53% of MultiNLI, and that specific linguistic phenomena such as negation and vagueness are highly correlated with certain inference classes.

NewsQA: A Machine Comprehension Dataset

NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs, is presented and analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

CoQA: A Conversational Question Answering Challenge

CoQA is introduced, a novel dataset for building Conversational Question Answering systems and it is shown that conversational questions have challenging phenomena not present in existing reading comprehension datasets (e.g., coreference and pragmatic reasoning).

Know What You Don’t Know: Unanswerable Questions for SQuAD

SQuadRUn is a new dataset that combines the existing Stanford Question Answering Dataset (SQuAD) with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.