Corpus ID: 237635337

Towards A Measure Of General Machine Intelligence

  title={Towards A Measure Of General Machine Intelligence},
  author={Gautham Venkatasubramanian and Sibesh Kar and Abhimanyu Singh and Shubham Mishra and Dushyant Yadav and Shreyansh Chandak},
To build general-purpose artificial intelligence systems that can deal with unknown variables across unknown domains, we need benchmarks that measure how well these systems perform on tasks they have never seen before. A prerequisite for this is a measure of a task’s generalization difficulty, or how dissimilar it is from the system’s prior knowledge and experience. If the skill of an intelligence system in a particular domain is defined as it’s ability to consistently generate a set of… Expand


Neuro-Symbolic Program Synthesis
This paper proposes a novel technique, Neuro-Symbolic Program Synthesis, that can automatically construct computer programs in a domain-specific language that are consistent with a set of input-output examples provided at test time and demonstrates the effectiveness of the approach by applying it to the rich and complex domain of regular expression based string transformations. Expand
IQ tests are not for machines, yet
Complex, but specific, tasks —such as chess or Jeopardy!— are popularly seen as milestones for artificial intelligence (AI). However, they are not appropriate for evaluating the intelligence ofExpand
Contemporary Approaches to Artificial General Intelligence
The aim is not to present AGI as a mature field of computer science – that would be inappropriate, but to present a series of approaches to AGI, at an early stage of engineering development, and some are still in the design phase. Expand
Universal Intelligence: A Definition of Machine Intelligence
A number of well known informal definitions of human intelligence are taken, and mathematically formalised to produce a general measure of intelligence for arbitrary machines that formally captures the concept of machine intelligence in the broadest reasonable sense. Expand
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks. Expand
Program Synthesis from Natural Language Using Recurrent Neural Networks
O‰entimes, a programmer may have diculty implementing a desired operation. Even when the programmer can describe her goal in English, it can be dicult to translate into code. Existing resources,Expand
Language Models are Unsupervised Multitask Learners
It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations. Expand
Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis
Reinforcement learning is performed on top of a supervised model with an objective that explicitly maximizes the likelihood of generating semantically correct programs, which leads to improved accuracy of the models, especially in cases where the training data is limited. Expand
Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement
This paper critically assess the different ways AI systems are evaluated, and the role of components and techniques in these systems, and identifies three kinds of evaluation: human discrimination, problem benchmarks and peer confrontation. Expand
Program Synthesis with Large Language Models
The limits of the current generation of large language models for program synthesis in general purpose programming languages are explored, and the semantic grounding of these models is explored by fine-tuning them to predict the results of program execution. Expand