Robust and Controllable Object-Centric Learning through Energy-based Models
@article{Zhang2022RobustAC, title={Robust and Controllable Object-Centric Learning through Energy-based Models}, author={Ruixiang Zhang and Tong Che and B. Ivanovic and Renhao Wang and Marco Pavone and Yoshua Bengio and Liam Paull}, journal={ArXiv}, year={2022}, volume={abs/2210.05519} }
Humans are remarkably good at understanding and reasoning about complex visual scenes. The capability to decompose low-level observations into discrete objects allows us to build a grounded abstract representation and identify the compositional structure of the world. Accordingly, it is a crucial step for machine learning models to be capable of inferring objects and their properties from visual scenes without explicit supervision. However, existing works on objectcentric representation…
Figures and Tables from this paper
References
SHOWING 1-10 OF 61 REFERENCES
Conditional Object-Centric Learning from Video
- Computer ScienceICLR
- 2022
Using the temporal dynamics of video data in the form of optical flow and conditioning the model on simple object location cues can be used to enable segmenting and tracking objects in significantly more realistic synthetic data, which could pave the way for a range of weakly-supervised approaches and allow more effective interaction with trained models.
Multi-Object Representation Learning with Iterative Variational Inference
- Computer ScienceICML
- 2019
This work argues for the importance of learning to segment and represent objects jointly, and demonstrates that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations.
Generalization and Robustness Implications in Object-Centric Learning
- Computer ScienceICML
- 2022
This paper trains state-of-the-art unsupervised models on five common multi-object datasets and evaluates segmentation accuracy and downstream object property prediction and finds object-centric representations to be generally useful for downstream tasks and robust to shifts in the data distribution.
GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations
- Computer ScienceICLR
- 2020
Generative latent-variable models are emerging as promising tools in robotics and reinforcement learning. Yet, even though tasks in these domains typically involve distinct objects, most…
Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
- Computer ScienceNIPS
- 2016
We present a framework for efficient inference in structured image models that explicitly reason about objects. We achieve this by performing probabilistic inference using a recurrent neural network…
Object-Centric Learning with Slot Attention
- Computer ScienceNeurIPS
- 2020
An architectural component that interfaces with perceptual representations such as the output of a convolutional neural network and produces a set of task-dependent abstract representations which are exchangeable and can bind to any object in the input by specializing through a competitive procedure over multiple rounds of attention is presented.
GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement
- Computer ScienceNeurIPS
- 2021
This work proposes an embedding-based approach in which embeddings of pixels are clustered in a differentiable fashion using a stochastic stick-breaking process to develop a new model, GENESIS-V2, which can infer a variable number of object representations without using RNNs or iterative refinement.
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition
- Computer ScienceICLR
- 2020
A generative latent variable model, called SPACE, is proposed that provides a unified probabilistic modeling framework that combines the best of spatial-attention and scene-mixture approaches and resolves the scalability problems of previous methods.
Towards Self-Supervised Learning of Global and Object-Centric Representations
- Computer ScienceArXiv
- 2022
This work shows that contrastive losses equipped with matching can be applied directly in a latent space, avoiding pixel-based reconstruction and discusses key aspects of learning structured object-centric representations with self-supervision.
Online Object Representations with Contrastive Learning
- Computer ScienceArXiv
- 2019
A self-supervised approach for learning representations of objects from monocular videos is proposed and found that given a limited set of objects, object correspondences will naturally emerge when using contrastive learning without requiring explicit positive pairs.