Unsupervised Image Representation Learning with Deep Latent Particles
@inproceedings{Daniel2022UnsupervisedIR, title={Unsupervised Image Representation Learning with Deep Latent Particles}, author={Tal Daniel and Aviv Tamar}, booktitle={International Conference on Machine Learning}, year={2022} }
We propose a new representation of visual data that disentangles object position from appearance. Our method, termed Deep Latent Particles (DLP), decomposes the visual input into low-dimensional latent “particles”, where each particle is described by its spatial location and features of its surrounding region. To drive learning of such representations, we follow a VAE-based approach and introduce a prior for particle positions based on a spatial-softmax architecture, and a modification of the…
Figures and Tables from this paper
References
SHOWING 1-10 OF 61 REFERENCES
Unsupervised Discovery of Object Landmarks as Structural Representations
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This paper proposes an autoencoding formulation to discover landmarks as explicit structural representations, which naturally creates an unsupervised, perceptible interface to manipulate object shapes and decode images with controllable structures.
Unsupervised Learning of Object Keypoints for Perception and Control
- Computer ScienceNeurIPS
- 2019
Transporter is introduced, a neural network architecture for discovering concise geometric object representations in terms of keypoints or image-space coordinates that helps track objects and object parts across long time-horizons more accurately than recent similar methods.
Multi-Object Representation Learning with Iterative Variational Inference
- Computer ScienceICML
- 2019
This work argues for the importance of learning to segment and represent objects jointly, and demonstrates that, starting from the simple assumption that a scene is composed of multiple entities, it is possible to learn to segment images into interpretable objects with disentangled representations.
Unsupervised Learning of Object Landmarks through Conditional Image Generation
- Computer ScienceNeurIPS
- 2018
This work proposes a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision and introduces a tight bottleneck in the geometry-extraction process that selects and distils geometry-related features.
Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2022
The proposed factorization results in landmarks that are focused on the foreground object of interest when measured against ground-truth foreground masks, and the rendered background quality is improved as ill-suited landmarks are no longer forced to model this content.
GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement
- Computer ScienceNeurIPS
- 2021
This work proposes an embedding-based approach in which embeddings of pixels are clustered in a differentiable fashion using a stochastic stick-breaking process to develop a new model, GENESIS-V2, which can infer a variable number of object representations without using RNNs or iterative refinement.
Unsupervised Learning of Object Landmarks by Factorized Spatial Embeddings
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
This paper proposes a novel unsupervised approach that can discover and learn landmarks in object categories, thus characterizing their structure, and shows that the learned landmarks establish meaningful correspondences between different object instances in a category without having to impose this requirement explicitly.
Sequential Attend, Infer, Repeat: Generative Modelling of Moving Objects
- Computer ScienceNeurIPS
- 2018
SQAIR is an interpretable deep generative model for image sequences that can reliably discover and track objects through the sequence; it can also conditionally generate future frames, thereby simulating expected motion of objects.
Unsupervised Learning of Object Structure and Dynamics from Videos
- Computer ScienceNeurIPS
- 2019
A keypoint-based image representation is adopted and a stochastic dynamics model of the keypoints is learned that outperforms unstructured representations on a range of motion-related tasks such as object tracking, action recognition and reward prediction.
Unsupervised learning of object frames by dense equivariant image labelling
- Computer ScienceNIPS
- 2017
A new approach is proposed that, given a large number of images of an object and no other supervision, can extract a dense object-centric coordinate frame that is invariant to deformations of the images and comes with a dense equivariant labelling neural network that can map image pixels to their corresponding object coordinates.