From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project

@article{Clark2019FromT,
  title={From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project},
  author={Peter Clark and Oren Etzioni and Daniel Khashabi and Tushar Khot and Bhavana Dalvi and Kyle Richardson and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord and Niket Tandon and Sumithra Bhakthavatsalam and Dirk Groeneveld and Michal Guerquin and Michael Schmitz},
  journal={AI Mag.},
  year={2019},
  volume={41},
  pages={39-53}
}
AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy!, but the rich variety of standardized exams has remained a landmark challenge. Even as recently as 2016, the best AI system could achieve merely 59.3 percent on an 8th grade science exam. This article reports success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90 percent on the exam’s nondiagram, multiple choice (NDMC) questions. In addition… 

Figures and Tables from this paper

Hierarchy-Aware Multi-Hop Question Answering over Knowledge Graphs

HamQA is proposed, a novel Hierarchy-aware multi-hop Question Answering framework on knowledge graphs to effectively align the mutual hierarchical information between question contexts and KGs and designs a context-aware graph attentive network to capture context information.

GNN is a Counter? Revisiting GNN for Question Answering

It is discovered that even a very simple graph neural counter can outperform all the existing GNN modules on CommonsenseQA and OpenBookQA, two popular QA benchmark datasets which heavily rely on knowledge-aware reasoning.

G REASE LM: G RAPH REAS ONING E NHANCED L ANGUAGE M ODELS FOR Q UESTION A NSWERING

This work proposes GREASELM, a new model that fuses encoded representations from pretrained LMs and graph neural networks over multiple layers of modality interaction operations, allowing language context representations to be grounded by structured world knowledge, and allowing linguistic nuances in the context to inform the graph representations of knowledge.

GreaseLM: Graph REASoning Enhanced Language Models for Question Answering

GreaseLM is a new model that fuses encoded representations from pretrained LMs and graph neural networks over multiple layers of modality interaction operations, allowing language context representations to be grounded by structured world knowledge, and allowing linguistic nuances in the context to inform the graph representations of knowledge.

ScienceWorld: Is your Agent Smarter than a 5th Grader?

We present ScienceWorld, a benchmark to test agents’ scientific reasoning abilities in a new interactive text environment at the level of a standard elementary school science curriculum. Despite the

Humans Keep It One Hundred: an Overview of AI Journey

The results of AI Journey, a competition of AI-systems aimed to improve AI performance on knowledge bases, reasoning and text generation, are described, showing different approaches to task understanding and reasoning.

GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering

GrapeQA is proposed with two simple improvements on the Working Graph, which shows consistent improvements over its LM + KG predecessor (QA-GNN in particular) and large improvements on OpenBookQA.

14 Question Answering and Information Retrieval

The quest for knowledge is deeply human, and so it is not surprising that practically as soon as there were computers we were asking them questions. By the early 1960s, systems used the two major

Ranking Facts for Explaining Answers to Elementary Science Questions

Considering automated reasoning for elementary science question answering, this work addresses the novel task of generating explanations for answers from human-authored facts using a practically scalable framework of feature-rich support vector machines leveraging domain-targeted, hand-crafted features.

ACENet: Attention Guided Commonsense Reasoning on Hybrid Knowledge Graph

An Attention guided Commonsense rEasoning Network (ACENet) is proposed to endow the neural network with the capability of integrating hybrid knowledge and applies the multi-layer interaction of answer choices to continually strengthen correct choice information and guide the message passing of GNN.
...

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Repurposing Entailment for Multi-Hop Question Answering Tasks

Multee is introduced, a general architecture that can effectively use entailment models for multi-hop QA tasks and outperforms QA models trained only on the target QA datasets and the OpenAI transformer models when using an entailment function pre-trained on NLI datasets.

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

A new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI.

Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions

This paper describes an alternative approach that operates at three levels of representation and reasoning: information retrieval, corpus statistics, and simple inference over a semi-automatically constructed knowledge base, to achieve substantially improved results.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

Moving beyond the Turing Test with the Allen AI Science Challenge

Answering questions correctly from standardized eighth-grade science tests is itself a test of machine intelligence.

QuaRTz: An Open-Domain Dataset of Qualitative Relationship Questions

This work introduces the first open-domain dataset, called QuaRTz, for reasoning about textual qualitative relationships, and finds state-of-the-art results are substantially (20%) below human performance, presenting an open challenge to the NLP community.

MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms

A large-scale dataset of math word problems and an interpretable neural math problem solver by learning to map problems to their operation programs and a new representation language to model operation programs corresponding to each math problem that aim to improve both the performance and the interpretability of the learned models.

Can an AI get into the University of Tokyo

For the thousands of secondary school students who take Japan's university entrance exams each year, test days are longdreaded nightmares of jitters and sweaty palms. But the newest test taker can be
...