• Corpus ID: 199668689

PHYRE: A New Benchmark for Physical Reasoning

@inproceedings{Bakhtin2019PHYREAN,
  title={PHYRE: A New Benchmark for Physical Reasoning},
  author={Anton Bakhtin and Laurens van der Maaten and Justin Johnson and Laura Gustafson and Ross B. Girshick},
  booktitle={NeurIPS},
  year={2019}
}
Understanding and reasoning about physics is an important ability of intelligent agents. [] Key Method We test several modern learning algorithms on PHYRE and find that these algorithms fall short in solving the puzzles efficiently. We expect that PHYRE will encourage the development of novel sample-efficient agents that learn efficient but useful models of physics. For code and to play PHYRE for yourself, please visit this https URL.
Cracking PHYRE with a World Model
TLDR
The results show that the world model is able to help an agent reason about physics, and is capable of perceiving environmental changes and predicting plausible evolution of the environment based on its perceived information.
Phy-Q: A Testbed for Physical Reasoning
TLDR
A new testbed that requires an agent to reason about physical scenarios and take an action appropriately, inspired by how human IQ is calculated, and encourages the development of intelligent agents that can reach the human level Phy-Q score.
Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning
TLDR
This work proposes that the flexibility of human physical problem solving rests on an ability to imagine the effects of hypothesized actions, while the efficiency of human search arises from rich action priors which are updated via observations of the world.
Forward Prediction for Physical Reasoning
TLDR
It is found that forward-prediction models improve the performance of physical-reasoning agents, particularly on complex tasks that involve many objects, however, these improvements are contingent on the training tasks being similar to the test tasks, and that generalization to different tasks is more challenging.
ESPRIT: Explaining Solutions to Physical Reasoning Tasks
TLDR
ESRIT is a framework for commonsense reasoning about qualitative physics in natural language that generates interpretable descriptions of physical events using a data-to-text approach and learns to generate explanations of how the physical simulation will causally evolve.
Physical Reasoning Using Dynamics-Aware Models
TLDR
This study defines a distance measure between the trajectory of two target objects, and uses this distance measure to characterize the similarity of two environment rollouts and trains the model to correctly rank rollouts according to this measure in addition to predicting the correct reward.
AGENT: A Benchmark for Core Psychological Reasoning
TLDR
A benchmark consisting of a large dataset of procedurally generated 3D animations, AGENT, suggests that to pass the designed tests of core intuitive psychology at human levels, a model must acquire or have built-in representations of how agents plan, combining utility computations and core knowledge of objects and physics.
A Survey on Machine Learning Approaches for Modelling Intuitive Physics
TLDR
The survey will first categorize existing deep learning approaches into three facets of physical reasoning before organizing them into three general technical approaches and propose six categorical tasks of the field.
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning
TLDR
CausalWorld is proposed, a benchmark for causal structure and transfer learning in a robotic manipulation environment that is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer.
A Measure of Visuospatial Reasoning Skills: Painting the Big Picture
TLDR
A comprehensive benchmark, with properties including breadth, depth, explainability, and domain-specificity, would allow for expanding analysis of existing datasets’ and agents’ applicability to the problem of generalized visuospatial reasoning.
...
...

References

SHOWING 1-10 OF 62 REFERENCES
Physical Reasoning
  • E. Davis
  • Biology
    Handbook of Knowledge Representation
  • 2008
Running the Table: An AI for Computer Billiards
TLDR
Several approaches to establishing a strong AI for billiards are developed and the resulting program, PickPocket, won the first international computerBilliards competition.
The Tools Challenge: Rapid Trial-and-Error Learning in Physical Problem Solving
TLDR
How the Tools challenge might guide the development of better physical reasoning agents in AI, as well as better accounts of human physical reasoning and tool use are discussed.
IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning
TLDR
An evaluation framework which diagnoses how much a given system understands about physics by testing whether it can tell apart well matched videos of possible versus impossible events, and describes the first release of a benchmark dataset aimed at learning intuitive physics in an unsupervised way.
Probing Physics Knowledge Using Tools from Developmental Psychology
TLDR
This work introduces the VOE technique, and describes a set of probe datasets inspired by classic test stimuli from developmental psychology that are tested on a baseline deep learning system, as well as on a physics learning dataset recently posed by another research group.
Inferring and Executing Programs for Visual Reasoning
TLDR
A model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer is proposed.
Reasoning About Physical Interactions with Object-Oriented Prediction and Planning
TLDR
This work presents a paradigm for learning object-centric representations for physical scene understanding without direct supervision of object properties, and can use its learned representations to build block towers more complicated than those observed during training.
ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking
TLDR
This paper provides ShapeStacks, a simulation-based dataset featuring 20,000 stack configurations composed of a variety of elementary geometric primitives richly annotated regarding semantics and structural stability, and trains visual classifiers for binary stability prediction on the data and scrutinise their learned physical intuition.
Do New Caledonian crows solve physical problems through causal reasoning?
TLDR
It is suggested that New Caledonian crows can solve complex physical problems by reasoning both causally and analogically about causal relations, and may form the basis of the NewCaledonian crow's exceptional tool skills.
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
TLDR
This work presents a diagnostic dataset that tests a range of visual reasoning abilities and uses this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.
...
...