My Computer Is an Honor Student - but How Intelligent Is It? Standardized Tests as a Measure of AI

@article{Clark2016MyCI,
  title={My Computer Is an Honor Student - but How Intelligent Is It? Standardized Tests as a Measure of AI},
  author={Peter Clark and Oren Etzioni},
  journal={AI Mag.},
  year={2016},
  volume={37},
  pages={5-12}
}
Given the well-known limitations of the Turing Test, there is a need for objective tests to both focus attention on, and measure progress towards, the goals of AI. In this paper we argue that machine performance on standardized tests should be a key component of any new measure of AI, because attaining a high level of performance requires solving significant AI problems involving language understanding and world modeling - critical skills for any machine that lays claim to intelligence. In… 

Figures and Tables from this paper

Towards Literate Artificial Intelligence
TLDR
A unified max-margin framework that learns to find hidden structures given a corpus of question-answer pairs, and uses what it learns to answer questions on novel texts to obtain state-of-the-art performance on two well-known natural language comprehension benchmarks.
A Survey of Question Answering for Math and Science Problem
TLDR
The progress made towards the goal of making a machine smart enough to pass the standardized test is explored, and the challenges and opportunities posed by the domain are seen.
From 'F' to 'A' on the N.Y. Regents Science Exams: An Overview of the Aristo Project
TLDR
Success is reported on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90 percent on the exam’s nondiagram, multiple choice (NDMC) questions, demonstrating that modern natural language processing methods can result in mastery on this task.
Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions
TLDR
This paper describes an alternative approach that operates at three levels of representation and reasoning: information retrieval, corpus statistics, and simple inference over a semi-automatically constructed knowledge base, to achieve substantially improved results.
Easy Questions First? A Case Study on Curriculum Learning for Question Answering
TLDR
This work compares a number of curriculum learning proposals in the context of four non-convex models for QA and shows that they lead to real improvements in each of them.
The Measure of All Minds: Evaluating Natural and Artificial Intelligence
TLDR
Using algorithmic information theory as a foundation, the book elaborates on the evaluation of perceptual, developmental, social, verbal and collective features and critically analyzes what the future of intelligence might look like.
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
TLDR
A new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering constitute the AI2 Reasoning Challenge (ARC), which requires far more powerful knowledge and reasoning than previous challenges such as SQuAD or SNLI.
Measuring abstract reasoning in neural networks
TLDR
A dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test, is proposed and ways to both measure and induce stronger abstract reasoning in neural networks are introduced.
Post-turing Methodology: Breaking the Wall on the Way to Artificial General Intelligence
TLDR
Comprehensive criticism of the Turing test is offered and quality criteria for new artificial general intelligence assessment tests are developed, suggesting that by restricting thinking ability to symbolic systems alone Turing unknowingly constructed “the wall” that excludes any possibility of transition from a complex observable phenomenon to an abstract image or concept.
...
...

References

SHOWING 1-10 OF 47 REFERENCES
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
TLDR
This work argues for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering, and classify these tasks into skill sets so that researchers can identify (and then rectify) the failings of their systems.
Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions
TLDR
This paper describes an alternative approach that operates at three levels of representation and reasoning: information retrieval, corpus statistics, and simple inference over a semi-automatically constructed knowledge base, to achieve substantially improved results.
A study of the knowledge base requirements for passing an elementary science test
TLDR
The analysis suggests that as well as fact extraction from text and statistically driven rule extraction, three other styles of automatic knowledge base construction (AKBC) would be useful: acquiring definitional knowledge, direct 'reading' of rules from texts that state them, and, given a particular representational framework, acquisition of specific instances of those models from text.
MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
TLDR
MCTest is presented, a freely available set of stories and associated questions intended for research on the machine comprehension of text that requires machines to answer multiple-choice reading comprehension questions about fictional stories, directly tackling the high-level goal of open-domain machine comprehension.
Psychometric artificial intelligence
TLDR
The paper in the present issue of JETAI can be plausibly viewed as resurrecting this approach to machine intelligence, in the form of what is called Psychometric AI, or just PAI (rhymes with ‘ ’).
Exploring Markov Logic Networks for Question Answering
TLDR
A system that reasons with knowledge derived from textbooks, represented in a subset of firstorder logic, called Praline, which demonstrates a 15% accuracy boost and a 10x reduction in runtime as compared to other MLNbased methods, and comparable accuracy to word-based baseline approaches.
The Winograd Schema Challenge
TLDR
This paper presents an alternative to the Turing Test that has some conceptual and practical advantages, and English-speaking adults will have no difficulty with it, and the subject is not required to engage in a conversation and fool an interrogator into believing she is dealing with a person.
A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.
TLDR
A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.
Computing Machinery and Intelligence
  • A. Turing
  • Philosophy
    The Philosophy of Artificial Intelligence
  • 1950
TLDR
The question, “Can machines think?” is considered, and the question is replaced by another, which is closely related to it and is expressed in relatively unambiguous words.
Learning to Solve Arithmetic Word Problems with Verb Categorization
TLDR
The paper analyzes the arithmetic-word problems “genre”, identifying seven categories of verbs used in such problems, and reports the first learning results on this task without reliance on predefined templates and makes the data publicly available.
...
...