Corpus ID: 203836132

Structured Object-Aware Physics Prediction for Video Modeling and Planning

@article{Kossen2020StructuredOP,
  title={Structured Object-Aware Physics Prediction for Video Modeling and Planning},
  author={Jannik Kossen and Karl Stelzner and Marcel Hussing and C. Voelcker and Kristian Kersting},
  journal={ArXiv},
  year={2020},
  volume={abs/1910.02425}
}
When humans observe a physical system, they can easily locate objects, understand their interactions, and anticipate future behavior, even in settings with complicated and previously unseen interactions. For computers, however, learning such models from videos in an unsupervised fashion is an unsolved research problem. In this paper, we present STOVE, a novel state-space model for videos, which explicitly reasons about objects and their positions, velocities, and interactions. It is constructed… Expand
A Symmetric and Object-Centric World Model for Stochastic Environments
Object-centric world models learn useful representations for planning and control but have so far only been applied to synthetic and deterministic environments. We introduce aExpand
Unsupervised Video Prediction from a Single Frame by Estimating 3D Dynamic Scene Structure
TLDR
Experiments on two challenging datasets of natural videos show that the model developed can estimate 3D structure and motion segmentation from a single frame, and hence generate plausible and varied predictions. Expand
RAIN: Reinforced Hybrid Attention Inference Network for Motion Forecasting
TLDR
A generic motion forecasting framework with dynamic key information selection and ranking based on a hybrid attention mechanism that not only achieves state-of-the-art forecasting performance, but also provides interpretable and reasonable hybrid attention weights. Expand
Structured World Belief for Reinforcement Learning in POMDP
TLDR
This paper proposes Structured World Belief, a model for learning and inference of object-centric belief statesferred by Sequential Monte Carlo (SMC), and shows the efficacy of structured world belief in improving the performance of reinforcement learning, planning and supervised reasoning. Expand
Unsupervised Video Decomposition using Spatio-temporal Iterative Inference
TLDR
This work proposes a novel spatio-temporal iterative inference framework that is powerful enough to jointly model complex multi-object representations and explicit temporal dependencies between latent variables across frames, and demonstrates the decomposition, segmentation, and prediction capabilities of this model. Expand
GATSBI: Generative Agent-centric Spatio-temporal Object Interaction
We present GATSBI, a generative model that can transform a sequence of raw observations into a structured latent representation that fully captures the spatio-temporal context of the agent’s actions.Expand
Self-Supervised Decomposition, Disentanglement and Prediction of Video Sequences while Interpreting Dynamics: A Koopman Perspective
TLDR
This work proposes a method to decompose a video into moving objects and their attributes, and model each object’s dynamics with linear system identification tools, by means of a Koopman embedding, which allows interpretation, manipulation and extrapolation of the dynamics of the different objects by employing the Koop man operatorK. Expand
Generalization and Robustness Implications in Object-Centric Learning
TLDR
This paper trains state-of-the-art unsupervised models on five common multi-object datasets and evaluates segmentation accuracy and downstream object property prediction and finds object-centric representations to be generally useful for downstream tasks and robust to shifts in the data distribution. Expand
Spatially Structured Recurrent Modules
TLDR
This work model the dynamical system as a collection of autonomous but sparsely interacting sub-systems that interact according to a learned topology which is informed by the spatial structure of the underlying system, which gives rise to a class of models that are well suited for capturing the dynamics of systems that only offer local views into their state. Expand
Uncovering Closed-form Governing Equations of Nonlinear Dynamics from Videos
TLDR
A novel end-to-end unsupervised deep learning framework to uncover the mathematical structure of equations that governs the dynamics of moving objects in videos and enables discovery of parsimonious interpretable model in a flexible and accessible sensing environment where only videos are available. Expand
...
1
2
3
...

References

SHOWING 1-10 OF 53 REFERENCES
Reasoning About Physical Interactions with Object-Centric Models
Object-based factorizations provide a useful level of abstraction for interacting with the world. Building explicit object representations, however, often requires supervisory signals that areExpand
Unsupervised Intuitive Physics from Visual Observations
TLDR
It is demonstrated for the first time that it is possible to learn reliable extrapolators of the object trajectories from raw videos alone, without any form of external supervision and with no more prior knowledge than the choice of a convolutional neural network architecture. Expand
Graph networks as learnable physics engines for inference and control
TLDR
A new class of learnable models are introduced--based on graph networks--which implement an inductive bias for object- and relation-centric representations of complex, dynamical systems, and offers new opportunities for harnessing and exploiting rich knowledge about the world. Expand
A Compositional Object-Based Approach to Learning Physical Dynamics
TLDR
The NPE's compositional representation of the structure in physical interactions improves its ability to predict movement, generalize across variable object count and different scene configurations, and infer latent properties of objects such as mass. Expand
Visual Interaction Networks: Learning a Physics Simulator from Video
TLDR
The Visual Interaction Network is introduced, a general-purpose model for learning the dynamics of a physical system from raw visual observations, consisting of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks. Expand
Learning Visual Predictive Models of Physics for Playing Billiards
TLDR
This paper explores how an agent can be equipped with an internal model of the dynamics of the external world, and how it can use this model to plan novel actions by running multiple internal simulations ("visual imagination"). Expand
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
TLDR
This study points towards an account of human vision with generative physical knowledge at its core, and various recognition models as helpers leading to efficient inference. Expand
Physics 101: Learning Physical Object Properties from Unlabeled Videos
TLDR
An unsupervised representation learning model is proposed, which explicitly encodes basic physical laws into the structure and use them, with automatically discovered observations from videos, as supervision, and demonstrates how its generative nature enables solving other tasks such as outcome prediction. Expand
Physics-as-Inverse-Graphics: Joint Unsupervised Learning of Objects and Physics from Video
TLDR
The approach significantly outperforms related unsupervised methods in long-term future frame prediction of systems with interacting objects (such as ball-spring or 3-body gravitational systems) and provides unique capabilities in goal-driven control and physical reasoning for zero-data adaptation. Expand
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural networkExpand
...
1
2
3
4
5
...