Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions

  title={Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions},
  author={Peter Clark and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Oyvind Tafjord and Peter D. Turney and Daniel Khashabi},
What capabilities are required for an AI system to pass standard 4th Grade Science Tests. [] Key Result We conclude with a detailed analysis, illustrating the complementary strengths of each method in the ensemble. Our datasets are being released to enable further research.

Figures and Tables from this paper

What’s in an Explanation? Characterizing Knowledge and Inference Requirements for Elementary Science Exams

This work develops an explanation-based analysis of knowledge and inference requirements, which supports a fine-grained characterization of the challenges, and compares a retrieval and an inference solver on 212 questions.

Question Answering via Integer Programming over Semi-Structured Knowledge

This work proposes a structured inference system for this task, formulated as an Integer Linear Program (ILP), that answers natural language questions using a semi-structured knowledge base derived from text, including questions requiring multi-step inference and a combination of multiple facts.

WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-Hop Inference

A corpus of explanations for standardized science exams, a recent challenge task for question answering, is presented and an explanation-centered tablestore is provided, a collection of semi-structured tables that contain the knowledge to construct these elementary science explanations.

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

A new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI.

Question Answering as Global Reasoning Over Semantic Abstractions

This work presents the first system that reasons over a wide range of semantic abstractions of the text, which are derived using off-the-shelf, general-purpose, pre-trained natural language modules such as semantic role labelers, coreference resolvers, and dependency parsers.

A Study of Automatically Acquiring Explanatory Inference Patterns from Corpora of Explanations: Lessons from Elementary Science Exams

The possibility of generating large explanations with an average of six facts by automatically extracting common explanatory patterns from a corpus of manually authored elementary science explanations represented as lexically-connected explanation graphs grounded in a semi-structured knowledge base of tables is explored.

Improving Question Answering with External Knowledge

This work explores simple yet effective methods for exploiting two sources of externalknowledge for exploiting unstructured external knowledge for subject-area QA on multiple-choice question answering tasks in subject areas such as science.

Improving Natural Language Inference Using External Knowledge in the Science Questions Domain

A combination of techniques that harness knowledge graphs to improve performance on the NLI problem in the science questions domain and achieves the new state-of-the-art performance over the SciTail science questions dataset.

Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences

The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills, and finds human solvers to achieve an F1-score of 88.1%.

Explainable Inference Over Grounding-Abstract Chains for Science Questions

This paper frames question answering as a natural language abductive reasoning problem, constructing plausible explanations for each candidate answer and then selecting the candidate with the best explanation as the final answer by employing a linear programming formalism.



Exploring Markov Logic Networks for Question Answering

A system that reasons with knowledge derived from textbooks, represented in a subset of firstorder logic, called Praline, which demonstrates a 15% accuracy boost and a 10x reduction in runtime as compared to other MLNbased methods, and comparable accuracy to word-based baseline approaches.

Automatic Construction of Inference-Supporting Knowledge Bases

This paper describes the work on automatically constructing an inferential knowledge base, and applying it to a question-answering task, and suggests several challenges that this approach poses, and innovative, partial solutions that have been developed.

Learning to Rank Answers to Non-Factoid Questions from Web Collections

This work shows that it is possible to exploit existing large collections of question–answer pairs to extract such features and train ranking models which combine them effectively, providing one of the most compelling evidence to date that complex linguistic features such as word senses and semantic roles can have a significant impact on large-scale information retrieval tasks.

Project Halo Update - Progress Toward Digital Aristotle

The design and evaluation results for a system called AURA are presented, which enables domain experts in physics, chemistry, and biology to author a knowledge base and that then allows a different set of users to ask novel questions against that knowledge base.

Open question answering over curated and extracted knowledge bases

This paper presents OQA, the first approach to leverage both curated and extracted KBs, and demonstrates that it achieves up to twice the precision and recall of a state-of-the-art Open QA system.

Elementary School Science and Math Tests as a Driver for AI: Take the Aristo Challenge!

This work is working on a specific version of this challenge, namely having the computer pass Elementary School Science and Math exams, the most difficult requiring significant progress in AI.

My Computer Is an Honor Student - but How Intelligent Is It? Standardized Tests as a Measure of AI

It is argued that machine performance on standardized tests should be a key component of any new measure of AI, because attaining a high level of performance requires solving significant AI problems involving language understanding and world modeling - critical skills for any machine that lays claim to intelligence.

Markov logic networks

Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach to combining first-order logic and probabilistic graphical models in a single representation.

A probabilistic graphical model for joint answer ranking in question answering

A probabilistic graphical model is applied for answer ranking in question answering which estimates the joint probability of correctness of all answer candidates, from which the probability of Correctness of an individual candidate can be inferred.

Question Answering Using Enhanced Lexical Semantic Models

This work focuses on improving the performance using models of lexical semantic resources and shows that these systems can be consistently and significantly improved with rich lexical semantics information, regardless of the choice of learning algorithms.