• Corpus ID: 222208727

Hierarchical Relational Inference

  title={Hierarchical Relational Inference},
  author={Aleksandar Stani'c and Sjoerd van Steenkiste and J{\"u}rgen Schmidhuber},
Common-sense physical reasoning in the real world requires learning about the interactions of objects and their dynamics. The notion of an abstract object, however, encompasses a wide variety of physical objects that differ greatly in terms of the complex behaviors they support. To address this, we propose a novel approach to physical reasoning that models objects as hierarchies of parts that may locally behave separately, but also act more globally as a single whole. Unlike prior approaches… 
On the Binding Problem in Artificial Neural Networks
This paper proposes a unifying framework that revolves around forming meaningful entities from unstructured sensory inputs, maintaining this separation of information at a representational level (representation), and using these entities to construct new inferences, predictions, and behaviors (composition).
KINet: Keypoint Interaction Networks for Unsupervised Forward Modeling
KINet (Keypoint Interaction Network)—an end-to-end unsupervised framework to reason about object interactions in complex systems based on a keypoint representation and learns plannable object-centric representations which can also be used in downstream model-based control tasks.
Unsupervised Image Decomposition with Phase-Correlation Networks
The Phase-Correlation Decomposition Network (PCDNet), a novel model that decomposes a scene into its object components, which are represented as transformed versions of a set of learned object prototypes, is proposed.
VAEL: Bridging Variational Autoencoders and Probabilistic Logic Programming
To the best of the knowledge, this work is the first to propose a general-purpose end-to-end framework integrating probabilistic logic programming into a deep generative model.
Boxhead: A Dataset for Learning Hierarchical Representations
This work introduces Boxhead, a dataset with hierarchically structured ground-truth generative factors and uses this novel dataset to evaluate the performance of state-of-the-art autoencoder-based disentanglement models and observes that hierarchical models generally outperform single-layer VAEs in terms of disentangling of hierarchically arranged factors.
Generative Scene Graph Networks
Generative Scene Graph Networks are proposed, the first deep generative model that learns to discover the primitive parts and infer the part-whole relationship jointly from multi-object scenes without supervision and in an end-to-end trainable way.
Leveraging Hidden Structure in Self-Supervised Learning
A principled framework based on a mutual information objective, which integrates self-supervised and structure learning is proposed, which achieves higher generalization performance in downstream classification tasks and provides more interpretable representations compared to the ones learnt through traditional self- supervised learning.
Unsupervised Object Keypoint Learning using Local Spatial Predictability
The efficacy of PermaKey on Atari where it learns keypoints corresponding to the most salient object parts and is robust to certain visual distractors is demonstrated and how agents equipped with keypoints outperform those using competing alternatives, even on challenging environments with moving backgrounds or distractor objects is demonstrated.


Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions
This work presents a novel method that learns to discover objects and model their physical interactions from raw visual images in a purely unsupervised fashion and incorporates prior knowledge about the compositional nature of human perception to factor interactions between object-pairs and learn efficiently.
Neural Expectation Maximization
This paper explicitly formalizes the automated discovery of distributed symbol-like representations in a spatial mixture model where each component is parametrized by a neural network and derives a differentiable clustering method that simultaneously learns how to group and represent individual entities.
Multi-Object Representation Learning with Iterative Variational Inference
This work argues for the importance of learning to segment and represent objects jointly, and demonstrates that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations.
Graph networks as learnable physics engines for inference and control
A new class of learnable models are introduced--based on graph networks--which implement an inductive bias for object- and relation-centric representations of complex, dynamical systems, and offers new opportunities for harnessing and exploiting rich knowledge about the world.
A Compositional Object-Based Approach to Learning Physical Dynamics
The NPE's compositional representation of the structure in physical interactions improves its ability to predict movement, generalize across variable object count and different scene configurations, and infer latent properties of objects such as mass.
Generative Hierarchical Models for Parts, Objects, and Scenes
This paper proposes the first deep latent variable model, called RICH, for learning Representation of Interpretable Compositional Hierarchies, a latent scene graph representation that organizes the entities of a scene into a tree structure according to their compositional relationships.
Interaction Networks for Learning about Objects, Relations and Physics
The interaction network is introduced, a model which can reason about how objects in complex systems interact, supporting dynamical predictions, as well as inferences about the abstract properties of the system, and is implemented using deep neural networks.
Neural Relational Inference for Interacting Systems
The NRI model is introduced: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data, in the form of a variational auto-encoder.
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network
Flexible Neural Representation for Physics Prediction
The Hierarchical Relation Network (HRN) is described, an end-to-end differentiable neural network based on hierarchical graph convolution that learns to predict physical dynamics in this hierarchical particle-based object representation.