• Corpus ID: 3922816

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

  title={Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge},
  author={Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord},
We present a new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering. [] Key Method The dataset contains only natural, grade-school science questions (authored for human tests), and is the largest public-domain set of this kind (7,787 questions). We test several baselines on the Challenge Set, including leading neural models from the SQuAD and SNLI tasks, and find that none are able to significantly outperform a random baseline, reflecting the difficult…

Figures and Tables from this paper

Answering Science Exam Questions Using Query Rewriting with Background Knowledge

A system that rewrites a given question into queries that are used to retrieve supporting text from a large corpus of science-related text is presented and is able to outperform several strong baselines on the ARC dataset.

Getting Closer to AI Complete Question Answering: A Set of Prerequisite Real Tasks

QuAIL is presented, the first RC dataset to combine text-based, world knowledge and unanswerable questions, and to provide question type annotation that would enable diagnostics of the reasoning strategies by a given QA system.

KG^2: Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings

This paper proposes a novel framework for answering science exam questions, which mimics human solving process in an open-book exam and outperforms the previous state-of-the-art QA systems.

Reasoning-Driven Question-Answering for Natural Language Understanding

This thesis proposes a formulation for abductive reasoning in natural language and shows its effectiveness, especially in domains with limited training data, and presents the first formal framework for multi-step reasoning algorithms, in the presence of a few important properties of language use.

Visuo-Lingustic Question Answering (VLQA) Challenge

A novel task to derive joint inference about a given image-text modality and compile the Visuo-Linguistic Question Answering (VLQA) challenge corpus in a question answering setting and believes that VLQA will be a good benchmark for reasoning over a visuo-linguistic context.

A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

This work proposes a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset and demonstrates that although naive information retrieval methods return sentences that are irrelevant to answering the query, sufficient supporting text is often present in the (ARC) corpus.

Neural Natural Logic Inference for Interpretable Question Answering

This paper investigates a neural-symbolic QA approach that integrates natural logic reasoning within deep learning architectures, towards developing effective and yet explainable question answering models.

Exploring ways to incorporate additional knowledge to improve Natural Language Commonsense Question Answering

This work identifies external knowledge sources, and shows that the performance further improves when a set of facts retrieved through IR is prepended to each MCQ question during both training and test phase, and presents three different modes of passing knowledge and five different models of using knowledge including the standard BERT MCQ model.

Improving Retrieval-Based Question Answering with Deep Inference Models

This proposed two-step model outperforms the best retrieval-based solver by over 3% in absolute accuracy and can answer both simple, factoid questions and more complex questions that require reasoning or inference.

Advances in Automatically Solving the ENEM

This work builds on a previous solution that formulated the problem of answering purely textual multiple-choice questions from the ENEM as a text information retrieval problem and investigates how to enhance these methods by text augmentation using Word Embedding and WordNet, a structured lexical database where words are connected according to some relations.



Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems.

Question Answering via Integer Programming over Semi-Structured Knowledge

This work proposes a structured inference system for this task, formulated as an Integer Linear Program (ILP), that answers natural language questions using a semi-structured knowledge base derived from text, including questions requiring multi-step inference and a combination of multiple facts.

Question Answering as Global Reasoning Over Semantic Abstractions

This work presents the first system that reasons over a wide range of semantic abstractions of the text, which are derived using off-the-shelf, general-purpose, pre-trained natural language modules such as semantic role labelers, coreference resolvers, and dependency parsers.

SciTaiL: A Textual Entailment Dataset from Science Question Answering

A new dataset and model for textual entailment, derived from treating multiple-choice question-answering as an entailment problem, is presented, and it is demonstrated that one can improve accuracy on SCITAIL by 5% using a new neural model that exploits linguistic structure.

Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions

This paper describes an alternative approach that operates at three levels of representation and reasoning: information retrieval, corpus statistics, and simple inference over a semi-automatically constructed knowledge base, to achieve substantially improved results.

MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.

Query-Reduction Networks for Question Answering

Query-Reduction Network (QRN), a variant of Recurrent Neural Network (RNN) that effectively handles both short-term and long-term sequential dependencies to reason over multiple facts, is proposed.

SQuAD: 100,000+ Questions for Machine Comprehension of Text

A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%).

Answering Complex Questions Using Open Information Extraction

This work develops a new inference model for Open IE that can work effectively with multiple short facts, noise, and the relational structure of tuples, and significantly outperforms a state-of-the-art structured solver on complex questions of varying difficulty.

NewsQA: A Machine Comprehension Dataset

NewsQA, a challenging machine comprehension dataset of over 100,000 human-generated question-answer pairs, is presented and analysis confirms that NewsQA demands abilities beyond simple word matching and recognizing textual entailment.