Corpus ID: 231839706

Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge

  title={Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge},
  author={Sumithra Bhakthavatsalam and Daniel Khashabi and Tushar Khot and Bhavana Dalvi and Kyle Richardson and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord and Peter Clark},
We present the ARC-DA dataset, a direct-answer (“open response”, “freeform”) version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset. While ARC has been influential in the community, its multiple-choice format is unrepresentative of real-world questions, and multiple choice formats can be particularly susceptible to artifacts. The ARCDA dataset addresses these concerns by converting questions to direct-answer format using a combination of crowdsourcing and expert review. The… Expand

Figures and Tables from this paper

GooAQ: Open Question Answering with Diverse Answer Types
GOOAQ is presented, a large-scale dataset with a variety of answer types, containing both textual answers (short and long) as well as more structured ones such as collections, and is released to facilitate further research on improving QA with diverse response types. Expand
General-Purpose Question-Answering with Macaw
MACAW, a versatile, generative question-answering (QA) system that is built on UnifiedQA, and exhibits strong performance, zero-shot, on a wide variety of topics, including outperforming GPT-3 by over 10% (absolute) on Challenge300. Expand
TruthfulQA: Measuring How Models Mimic Human Falsehoods
It is suggested that scaling up models alone is less promising for improving truthfulness than fine-tuning using training objectives other than imitation of text from the web. Expand
Enriching a Model's Notion of Belief using a Persistent Memory
This work adds a memory component a BeliefBank that records a model’s answers, and two mechanisms that use it to improve consistency among beliefs, and shows that, in a controlled experimental setting, these two mechanisms improve both accuracy and consistency. Expand


Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
A new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI. Expand
Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering
This paper proposes a retriever-reader model that learns to attend on essential terms during the question answering process, and builds an essential term selector which first identifies the most important words in a question, then reformulates the query and searches for related evidence. Expand
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
This work presents CommonsenseQA: a challenging new dataset for commonsense question answering, which extracts from ConceptNet multiple target concepts that have the same semantic relation to a single source concept. Expand
QASC: A Dataset for Question Answering via Sentence Composition
This work presents a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question, and presents a two-step approach to mitigate the retrieval challenges. Expand
WorldTree V2: A Corpus of Science-Domain Structured Explanations and Inference Patterns supporting Multi-Hop Inference
This work presents the second iteration of the WorldTree project, a corpus of 5,114 standardized science exam questions paired with large detailed multi-fact explanations that combine core scientific knowledge and world knowledge, and uses this explanation corpus to author a set of 344 high-level science domain inference patterns similar to semantic frames supporting multi-hop inference. Expand
Answering Science Exam Questions Using Query Reformulation with Background Knowledge
This paper presents a system that reformulates a given question into queries that are used to retrieve supporting text from a large corpus of science-related text and outperforms several strong baselines on the ARC dataset. Expand
A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset
This work proposes a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset and demonstrates that although naive information retrieval methods return sentences that are irrelevant to answering the query, sufficient supporting text is often present in the (ARC) corpus. Expand
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
It is shown that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions. Expand
SQuAD: 100,000+ Questions for Machine Comprehension of Text
A strong logistic regression model is built, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%). Expand
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
It is shown that, in comparison to other recently introduced large-scale datasets, TriviaQA has relatively complex, compositional questions, has considerable syntactic and lexical variability between questions and corresponding answer-evidence sentences, and requires more cross sentence reasoning to find answers. Expand