Unsupervised Object Learning via Common Fate
@article{Tangemann2021UnsupervisedOL, title={Unsupervised Object Learning via Common Fate}, author={Matthias Tangemann and Steffen Schneider and Julius von K{\"u}gelgen and Francesco Locatello and Peter Gehler and Thomas Brox and Matthias Kummerer and Matthias Bethge and Bernhard Scholkopf}, journal={ArXiv}, year={2021}, volume={abs/2110.06562} }
Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling. We decompose this problem into three easier subtasks, and provide candidate solutions for each of them. Inspired by the Common Fate Principle of Gestalt Psychology, we first extract (noisy) masks of moving objects via unsupervised motion segmentation. Second, generative models are trained on the masks of the background and the moving objects, respectively. Third…
Figures and Tables from this paper
7 Citations
Unsupervised Segmentation in Real-World Images via Spelke Object Inference
- Computer ScienceECCV
- 2022
This work shows how to learn static grouping priors from motion self-supervision, building on the cognitive science notion of Spelke Objects: groupings of stuff that move together, and introduces Excitatory-Inhibitory Segment Extraction Network (EISEN), which learns from optical flow estimates to extract pairwise affinity graphs for static scenes.
Boosting Object Representation Learning via Motion and Object Continuity
- Computer ScienceArXiv
- 2022
This work proposes to ex-ploit object motion and continuity, i.e, objects do not pop in and out of existence, and shows clear benevolence of integrating motion and object continuity for downstream tasks, moving beyond object representation learning based only on reconstruction.
Discovering Objects that Can Move
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
This paper simplifies the recent auto-encoder based frameworks for unsuper-vised object discovery, and augment the resulting model with a weak learning signal from general motion segmentation algorithms, which is enough to generalize to segment both moving and static instances of dynamic objects.
Self-supervised Amodal Video Object Segmentation
- Computer ScienceArXiv
- 2022
A novel self-supervised learning paradigm that efficiently utilizes the visible object parts as the supervision to guide the training on videos, which achieves the state-of-the-art performance on the synthetic amodal segmentation benchmark FISHBOWL and the real world benchmark KINS-Video-Car.
TAP-Vid: A Benchmark for Tracking Any Point in a Video
- Computer ScienceArXiv
- 2022
A novel semi-automatic crowdsourced pipeline which uses optical estimates to compensate for easier, short-term motion like camera shake, allowing annotators to focus on harder sections of video, and proposes a simple end-to-end point tracking model TAP-Net, which outperforms all prior methods on the authors' benchmark when trained on synthetic data.
Bridging the Gap to Real-World Object-Centric Learning
- Computer ScienceArXiv
- 2022
DINOSAUR is the first unsupervised object-centric model that scales to real world-datasets such as COCO and PASCAL VOC and shows competitive performance compared to more involved pipelines from the computer vision literature.
Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos
- Computer ScienceArXiv
- 2022
This paper proposes STEVE, an unsupervised model for object-centric learning in videos that uses a transformer-based image decoder conditioned on slots and the learning objective is simply to reconstruct the observation.
References
SHOWING 1-10 OF 64 REFERENCES
Towards causal generative scene models via competition of experts
- Computer ScienceArXiv
- 2020
This work presents an alternative approach which uses an inductive bias encouraging modularity by training an ensemble of generative models (experts) and allows for controllable sampling of individual objects and recombination of experts in physically plausible ways.
Multi-Object Representation Learning with Iterative Variational Inference
- Computer ScienceICML
- 2019
This work argues for the importance of learning to segment and represent objects jointly, and demonstrates that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations.
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition
- Computer ScienceICLR
- 2020
A generative latent variable model, called SPACE, is proposed that provides a unified probabilistic modeling framework that combines the best of spatial-attention and scene-mixture approaches and resolves the scalability problems of previous methods.
Generative Modeling of Infinite Occluded Objects for Compositional Scene Representation
- Computer ScienceICML
- 2019
A deep generative model which explicitly models object occlusions for compositional scene representation and outperforms two state-of-the-art methods when object Occlusions exist is presented.
Unsupervised Moving Object Detection via Contextual Information Separation
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
An adversarial contextual model for detecting moving objects in images that can be thought of as a generalization of classical variational generative region-based segmentation, but in a way that avoids explicit regularization or solution of partial differential equations at run-time.
GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
The key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis and a fast and realistic image synthesis model is proposed.
Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects
- Computer ScienceNeurIPS
- 2018
SQAIR is an interpretable deep generative model for image sequences that can reliably discover and track objects through the sequence; it can also conditionally generate future frames, thereby simulating expected motion of objects.
Learning a Generative Model of Images by Factoring Appearance and Shape
- Computer ScienceNeural Computation
- 2011
This work introduces a basic model, the masked RBM, which explicitly models occlusion boundaries in image patches by factoring the appearance of any patch region from its shape, and proposes a generative model of larger images using a field of such RBMs.
Picture: A probabilistic programming language for scene perception
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
Picture is presented, a probabilistic programming language for scene understanding that allows researchers to express complex generative vision models, while automatically solving them using fast general-purpose inference machinery.
Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
This work addresses the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions with Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems.