Elementary School Science and Math Tests as a Driver for AI: Take the Aristo Challenge!
@inproceedings{Clark2015ElementarySS, title={Elementary School Science and Math Tests as a Driver for AI: Take the Aristo Challenge!}, author={Peter Clark}, booktitle={AAAI Conference on Artificial Intelligence}, year={2015} }
While there has been an explosion of impressive, data-driven AI applications in recent years, machines still largely lack a deeper understanding of the world to answer questions that go beyond information explicitly stated in text, and to explain and discuss those answers. To reach this next generation of AI applications, it is imperative to make faster progress in areas of knowledge, modeling, reasoning, and language. Standardized tests have often been proposed as a driver for such progress…
91 Citations
Towards Literate Artificial Intelligence
- Computer Science, Education
- 2020
A unified max-margin framework that learns to find hidden structures given a corpus of question-answer pairs, and uses what it learns to answer questions on novel texts to obtain state-of-the-art performance on two well-known natural language comprehension benchmarks.
Common Sense, the Turing Test, and the Quest for Real AI
- Computer Science
- 2017
Hector Levesque considers the role of language in learning, and identifies a possible mechanism behind common sense and the capacity to call on background knowledge: the ability to represent objects of thought symbolically.
Solving Mathematical Puzzles: a Deep Reasoning Challenge (Position Paper)
- Computer ScienceURANIA@AI*IA
- 2016
This work proposes a challenge: to design and implement an end-to-end solver for mathematical puzzles able to compete with primary school students, calling for an unprecedented integration of many different AI techniques.
Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions
- Computer ScienceAAAI
- 2016
This paper describes an alternative approach that operates at three levels of representation and reasoning: information retrieval, corpus statistics, and simple inference over a semi-automatically constructed knowledge base, to achieve substantially improved results.
Considerations for Evaluating Models of Language Understanding and Reasoning
- Computer Science
- 2015
A complementary task framework and evaluation dataset modeled closely on [1] is presented, which arguably preserves experimental control and allows for difficulty to be scaled up incrementally while also ensuring that all information relevant to solving the tasks is preserved in the training data.
Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks
- Computer ScienceArXiv
- 2020
This work introduces NUMBERGAME, a multifaceted benchmark to evaluate model performance across numerical reasoning tasks of eight diverse formats, and takes forward the recent progress in generic system development, demonstrating the scope of under-explored tasks.
Solving Mathematical Puzzles: A Challenging Competition for AI
- Computer ScienceAI Mag.
- 2017
Competitions have been and are currently run on conversational behavior, automatic control, cooperation and coordination in robotics, logic reasoning and knowledge, and natural language, which have brought many insights and advancements on various artificial intelligence fields.
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2020
This survey focuses on algebraic word problems, summarize their extracted features and proposed techniques to bridge the semantic gap, and compare their performance in the publicly accessible datasets.
From LSAT: The Progress and Challenges of Complex Reasoning
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2022
This paper proposes a hybrid reasoning system to integrate these three tasks of the LSAT, including analytical reasoning, logical reasoning and reading comprehension, and sheds a light on the potential future directions, like unsupervised symbolic knowledge extraction, model interpretability, few-shot learning and comprehensive benchmark for complex reasoning.
The Measure of All Minds: Evaluating Natural and Artificial Intelligence
- Psychology
- 2017
Using algorithmic information theory as a foundation, the book elaborates on the evaluation of perceptual, developmental, social, verbal and collective features and critically analyzes what the future of intelligence might look like.
References
SHOWING 1-10 OF 12 REFERENCES
The Limitations of Standardized Science Tests as Benchmarks for Artificial Intelligence Research: Position Paper
- EducationArXiv
- 2014
This position paper argues that standardized tests for elementary science such as SAT or Regents tests are not very good benchmarks for measuring the progress of artificial intelligence systems in understanding basic science and that more appropriate collections of exam style problems could be assembled.
MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
- Computer ScienceEMNLP
- 2013
MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.
A study of the knowledge base requirements for passing an elementary science test
- Computer ScienceAKBC '13
- 2013
The analysis suggests that as well as fact extraction from text and statistically driven rule extraction, three other styles of automatic knowledge base construction (AKBC) would be useful: acquiring definitional knowledge, direct 'reading' of rules from texts that state them, and, given a particular representational framework, acquisition of specific instances of those models from text.
The Winograd Schema Challenge
- LinguisticsKR
- 2011
This paper presents an alternative to the Turing Test that has some conceptual and practical advantages, and English-speaking adults will have no difficulty with it, and the subject is not required to engage in a conversation and fool an interrogator into believing she is dealing with a person.
Overview of Todai Robot Project and Evaluation Framework of its NLP-based Problem Solving
- Computer ScienceLREC
- 2014
The Todai Robot Project task focuses on benchmarking NLP systems for problem solving, and the details of the method to manage question resources and their correct answers, answering tools and participation by researchers in the task are described.
Diagram Understanding in Geometry Questions
- Computer ScienceAAAI
- 2014
This paper presents a method for diagram understanding that identifies visual elements in a diagram while maximizing agreement between textual and visual data, and shows that the method's objective function is submodular.
Can an AI get into the University of Tokyo
- Education
- 2013
For the thousands of secondary school students who take Japan's university entrance exams each year, test days are longdreaded nightmares of jitters and sweaty palms. But the newest test taker can be…
Etzioni, O. Diagram Understanding in Geometry Questions. AAAI
- Etzioni, O. Diagram Understanding in Geometry Questions. AAAI
- 2014
MCTest: A Challenge Dataset for the Machine Comprehension of Text
- MCTest: A Challenge Dataset for the Machine Comprehension of Text
- 2013
The Grade 4 Elementary-Level Science Test
- The Grade 4 Elementary-Level Science Test
- 2014