"What Happens If..." Learning to Predict the Effect of Forces in Images

@article{Mottaghi2016WhatHI,
  title={"What Happens If..." Learning to Predict the Effect of Forces in Images},
  author={Roozbeh Mottaghi and Mohammad Rastegari and Abhinav Gupta and Ali Farhadi},
  journal={ArXiv},
  year={2016},
  volume={abs/1603.05600}
}
What happens if one pushes a cup sitting on a table toward the edge of the table? How about pushing a desk against a wall? In this paper, we study the problem of understanding the movements of objects as a result of applying external forces to them. For a given force vector applied to a specific location in an image, our goal is to predict long-term sequential movements caused by that force. Doing so entails reasoning about scene geometry, objects, their attributes, and the physical rules that… Expand
Use the Force, Luke! Learning to Predict Physical Forces by Simulating Effects
TLDR
This paper addresses the problem of inferring contact points and the physical forces from videos of humans interacting with objects by using a physics simulator to predict effects, and enforce that estimated forces must lead to same effect as depicted in the video. Expand
Learning Vision-Based Physics Intuition Models for Non-Disruptive Object Extraction
TLDR
The results, in both simulation and real-world settings, show that with the proposed method, physics intuition models can be used to inform a robot of which objects can be safely extracted and from which direction to extract them. Expand
IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning
TLDR
An evaluation framework which diagnoses how much a given system understands about physics by testing whether it can tell apart well matched videos of possible versus impossible events, and describes the first release of a benchmark dataset aimed at learning intuitive physics in an unsupervised way. Expand
Unsupervised Learning for Physical Interaction through Video Prediction
TLDR
An action-conditioned video prediction model is developed that explicitly models pixel motion, by predicting a distribution over pixel motion from previous frames, and is partially invariant to object appearance, enabling it to generalize to previously unseen objects. Expand
Learning to Exploit Stability for 3D Scene Parsing
TLDR
A novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation is presented, and it is shown that applying physics supervision on unlabeled real images improves real domain transfer of models training on synthetic data. Expand
Visual Interaction Networks: Learning a Physics Simulator from Video
TLDR
The Visual Interaction Network is introduced, a general-purpose model for learning the dynamics of a physical system from raw visual observations, consisting of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks. Expand
Visual Interaction Networks
TLDR
The Visual Interaction Network is introduced, a general-purpose model for learning the dynamics of a physical system from raw visual observations, consisting of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks. Expand
Learning and Reasoning with Visual Correspondence in Time
TLDR
It is argued that the authors need to go beyond images and exploit the massive amount of correspondence in videos and capture long-range correspondence is also the key to video understanding as well as interaction reasoning. Expand
Visual Assessment for Non-Disruptive Object Extraction
Robots operating in human environments need to perform a variety of dexterous manipulation tasks on object arrangements that have complex physical support relationships, e.g. procuring utensils fromExpand
IntPhys 2019: A Benchmark for Visual Intuitive Physics Understanding
In order to reach human performance on complex visual tasks, artificial systems need to incorporate a significant amount of understanding of the world in terms of macroscopic objects, movements,Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 49 REFERENCES
Newtonian Image Understanding: Unfolding the Dynamics of Objects in Static Images
TLDR
This paper defines intermediate physical abstractions called Newtonian scenarios and introduces Newtonian Neural Network (N3) that learns to map a single image to a state in a Newtonian scenario. Expand
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
TLDR
This study points towards an account of human vision with generative physical knowledge at its core, and various recognition models as helpers leading to efficient inference. Expand
Dense Optical Flow Prediction from a Static Image
TLDR
This work presents a convolutional neural network (CNN) based approach for motion prediction that outperform all previous approaches by large margins and can predict future optical flow on a diverse set of scenarios. Expand
Show and tell: A neural image caption generator
TLDR
This paper presents a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. Expand
Learning Visual Predictive Models of Physics for Playing Billiards
TLDR
This paper explores how an agent can be equipped with an internal model of the dynamics of the external world, and how it can use this model to plan novel actions by running multiple internal simulations ("visual imagination"). Expand
Predicting Object Dynamics in Scenes
TLDR
This paper learns from sequences of abstract images gathered using crowd-sourcing to overcome a lack of densely annotated spatiotemporal data, and demonstrates qualitatively and quantitatively that their models produce plausible scene predictions on both the abstract images, as well as natural images taken from the Internet. Expand
Learning to place new objects in a scene
TLDR
A learning approach for placing multiple objects in different placing areas in a scene using an integer linear program and a graphical model to encode various properties, such as the stacking of objects, stability, object–area relationship and common placing constraints. Expand
Predictive Visual Models of Physics for Playing Billiards
Imagine a hypothetical person who has never encountered the game of billiards in their life. When introduced to the game, this person may not be very adept at playing the game, but would be capableExpand
Action-Conditional Video Prediction using Deep Networks in Atari Games
TLDR
This paper is the first to make and evaluate long-term predictions on high-dimensional video conditioned by control inputs and proposes and evaluates two deep neural network architectures that consist of encoding, action-conditional transformation, and decoding layers based on convolutional neural networks and recurrent neural networks. Expand
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of sceneExpand
...
1
2
3
4
5
...