• Corpus ID: 235390426

Unsupervised Co-part Segmentation through Assembly

@inproceedings{Gao2021UnsupervisedCS,
  title={Unsupervised Co-part Segmentation through Assembly},
  author={Qingzhe Gao and Bin Wang and Libin Liu and Baoquan Chen},
  booktitle={ICML},
  year={2021}
}
Co-part segmentation is an important problem in computer vision for its rich applications. We propose an unsupervised learning approach for co-part segmentation from images. For the training stage, we leverage motion information embedded in videos and explicitly extract latent representations to segment meaningful object parts. More importantly, we introduce a dual procedure of part-assembly to form a closed loop with part-segmentation, enabling an effective self-supervision. We demonstrate the… 
HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network
TLDR
Hierarchical Parsing Capsule Network (HP-Capsule) is proposed, which extends the application of capsule networks from digits to human faces and takes a step forward to show how the neural networks understand homologous objects without human intervention.
GANSeg: Learning to Segment by Unsupervised Hierarchical Image Generation
TLDR
This work proposes a GAN-based approach that generates images conditioned on latent masks, thereby alleviating full or weak annotations required by previous approaches and shows that such mask-conditioned image generation can be learned faithfully when conditioning the masks in a hierarchical manner on 2D latent points that define the position of parts explicitly.

References

SHOWING 1-10 OF 35 REFERENCES
SCOPS: Self-Supervised Co-Part Segmentation
TLDR
This work proposes a self-supervised deep learning approach for part segmentation, where several loss functions are devised that aids in predicting part segments that are geometrically concentrated, robust to object variations and are also semantically consistent across different object instances.
Segmentation of Moving Objects by Long Term Video Analysis
TLDR
This paper demonstrates that motion will be exploited most effectively, if it is regarded over larger time windows, and suggests working with a paradigm that starts with semi-dense motion cues first and that fills up textureless areas afterwards based on color.
Semantic part segmentation using compositional model combining shape and appearance
  • Jianyu Wang, A. Yuille
  • Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
TLDR
This paper builds a mixture of compositional models to represent the object boundary and the boundaries of semantic parts, and incorporates edge, appearance, and semantic part cues into the compositional model.
Unsupervised Learning of Object Landmarks through Conditional Image Generation
TLDR
This work proposes a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision and introduces a tight bottleneck in the geometry-extraction process that selects and distils geometry-related features.
Unsupervised Discovery of Parts, Structure, and Dynamics
TLDR
A novel formulation that simultaneously learns a hierarchical, disentangled object representation and a dynamics model for object parts from unlabeled videos is proposed.
First Order Motion Model for Image Animation
TLDR
This framework decouple appearance and motion information using a self-supervised formulation and uses a representation consisting of a set of learned keypoints along with their local affine transformations to support complex motions.
Articulated part-based model for joint object detection and pose estimation
TLDR
An Articulated Part-based Model for jointly detecting objects and estimating their poses is proposed and extensive quantitative and qualitative experiment results on public datasets show that APM outperforms state-of-the-art methods.
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture
  • D. Eigen, R. Fergus
  • Computer Science
    2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. We use a multiscale convolutional
Learning Articulated Structure and Motion
TLDR
This work model the structure of one or more articulated objects, given a time series of two-dimensional feature positions, in terms of “stick figure” objects, under the assumption that the relative joint angles between sticks can change over time, but their lengths and connectivities are fixed.
DensePose: Dense Human Pose Estimation in the Wild
TLDR
This work establishes dense correspondences between an RGB image and a surface-based representation of the human body, a task referred to as dense human pose estimation, and improves accuracy through cascading, obtaining a system that delivers highly-accurate results at multiple frames per second on a single gpu.
...
1
2
3
4
...