Elementary School Science and Math Tests as a Driver for AI: Take the Aristo Challenge!

@inproceedings{Clark2015ElementarySS,
  title={Elementary School Science and Math Tests as a Driver for AI: Take the Aristo Challenge!},
  author={Peter Clark},
  booktitle={AAAI Conference on Artificial Intelligence},
  year={2015}
}
  • Peter Clark
  • Published in
    AAAI Conference on Artificial…
    25 January 2015
  • Computer Science
While there has been an explosion of impressive, data-driven AI applications in recent years, machines still largely lack a deeper understanding of the world to answer questions that go beyond information explicitly stated in text, and to explain and discuss those answers. To reach this next generation of AI applications, it is imperative to make faster progress in areas of knowledge, modeling, reasoning, and language. Standardized tests have often been proposed as a driver for such progress… 

Towards Literate Artificial Intelligence

A unified max-margin framework that learns to find hidden structures given a corpus of question-answer pairs, and uses what it learns to answer questions on novel texts to obtain state-of-the-art performance on two well-known natural language comprehension benchmarks.

Common Sense, the Turing Test, and the Quest for Real AI

Hector Levesque considers the role of language in learning, and identifies a possible mechanism behind common sense and the capacity to call on background knowledge: the ability to represent objects of thought symbolically.

Solving Mathematical Puzzles: a Deep Reasoning Challenge (Position Paper)

This work proposes a challenge: to design and implement an end-to-end solver for mathematical puzzles able to compete with primary school students, calling for an unprecedented integration of many different AI techniques.

Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions

This paper describes an alternative approach that operates at three levels of representation and reasoning: information retrieval, corpus statistics, and simple inference over a semi-automatically constructed knowledge base, to achieve substantially improved results.

Considerations for Evaluating Models of Language Understanding and Reasoning

A complementary task framework and evaluation dataset modeled closely on [1] is presented, which arguably preserves experimental control and allows for difficulty to be scaled up incrementally while also ensuring that all information relevant to solving the tasks is preserved in the training data.

Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks

This work introduces NUMBERGAME, a multifaceted benchmark to evaluate model performance across numerical reasoning tasks of eight diverse formats, and takes forward the recent progress in generic system development, demonstrating the scope of under-explored tasks.

Solving Mathematical Puzzles: A Challenging Competition for AI

Competitions have been and are currently run on conversational behavior, automatic control, cooperation and coordination in robotics, logic reasoning and knowledge, and natural language, which have brought many insights and advancements on various artificial intelligence fields.

The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers

This survey focuses on algebraic word problems, summarize their extracted features and proposed techniques to bridge the semantic gap, and compare their performance in the publicly accessible datasets.

From LSAT: The Progress and Challenges of Complex Reasoning

This paper proposes a hybrid reasoning system to integrate these three tasks of the LSAT, including analytical reasoning, logical reasoning and reading comprehension, and sheds a light on the potential future directions, like unsupervised symbolic knowledge extraction, model interpretability, few-shot learning and comprehensive benchmark for complex reasoning.

The Measure of All Minds: Evaluating Natural and Artificial Intelligence

Using algorithmic information theory as a foundation, the book elaborates on the evaluation of perceptual, developmental, social, verbal and collective features and critically analyzes what the future of intelligence might look like.
...

References

SHOWING 1-10 OF 12 REFERENCES

The Limitations of Standardized Science Tests as Benchmarks for Artificial Intelligence Research: Position Paper

This position paper argues that standardized tests for elementary science such as SAT or Regents tests are not very good benchmarks for measuring the progress of artificial intelligence systems in understanding basic science and that more appropriate collections of exam style problems could be assembled.

MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text

MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.

A study of the knowledge base requirements for passing an elementary science test

The analysis suggests that as well as fact extraction from text and statistically driven rule extraction, three other styles of automatic knowledge base construction (AKBC) would be useful: acquiring definitional knowledge, direct 'reading' of rules from texts that state them, and, given a particular representational framework, acquisition of specific instances of those models from text.

The Winograd Schema Challenge

This paper presents an alternative to the Turing Test that has some conceptual and practical advantages, and English-speaking adults will have no difficulty with it, and the subject is not required to engage in a conversation and fool an interrogator into believing she is dealing with a person.

Overview of Todai Robot Project and Evaluation Framework of its NLP-based Problem Solving

The Todai Robot Project task focuses on benchmarking NLP systems for problem solving, and the details of the method to manage question resources and their correct answers, answering tools and participation by researchers in the task are described.

Diagram Understanding in Geometry Questions

This paper presents a method for diagram understanding that identifies visual elements in a diagram while maximizing agreement between textual and visual data, and shows that the method's objective function is submodular.

Can an AI get into the University of Tokyo

For the thousands of secondary school students who take Japan's university entrance exams each year, test days are longdreaded nightmares of jitters and sweaty palms. But the newest test taker can be

Etzioni, O. Diagram Understanding in Geometry Questions. AAAI

  • Etzioni, O. Diagram Understanding in Geometry Questions. AAAI
  • 2014

MCTest: A Challenge Dataset for the Machine Comprehension of Text

  • MCTest: A Challenge Dataset for the Machine Comprehension of Text
  • 2013

The Grade 4 Elementary-Level Science Test

  • The Grade 4 Elementary-Level Science Test
  • 2014