Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition

@article{Zheng2021VisitingTI,
  title={Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition},
  author={Chuanxia Zheng and Duy-Son Dao and Guoxian Song and T. Cham and Jianfei Cai},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.05367}
}
Existing scene understanding systems mainly focus on recognizing the visible parts of a scene, ignoring the intact appearance of physical objects in the real-world. Concurrently, image completion has aimed to create plausible appearance for the invisible regions, but requires a manual mask as input. In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene. Particularly, we built a system to… Expand

References

SHOWING 1-10 OF 70 REFERENCES
Hybrid Task Cascade for Instance Segmentation
TLDR
This work proposes a new framework, Hybrid Task Cascade (HTC), which differs in two important aspects: (1) instead of performing cascaded refinement on these two tasks separately, it interweaves them for a joint multi-stage processing; (2) it adopts a fully convolutional branch to provide spatial context, which can help distinguishing hard foreground from cluttered background. Expand
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of sceneExpand
Self-Supervised Scene De-Occlusion
TLDR
This paper makes the first attempt to address the problem of scene de-occlusion through a novel and unified framework that recovers hidden scene structures without ordering and amodal annotations as supervisions, via Partial Completion Network (PCNet)-mask (M) and -content (C), that learn to recover fractions of object masks and contents, respectively, in a self-supervised manner. Expand
Amodal Instance Segmentation With KINS Dataset
TLDR
The network structure to reason invisible parts via a new multi-task framework with Multi-View Coding (MVC), which combines information in various recognition levels is proposed, which effectively improves both amodal and inmodal segmentation. Expand
SeGAN: Segmenting and Generating the Invisible
TLDR
This paper studies the challenging problem of completing the appearance of occluded objects and proposes a novel solution, SeGAN, which outperforms state-of-the-art segmentation baselines for the invisible parts of objects. Expand
Semantic Amodal Segmentation
TLDR
A detailed image annotation that captures information beyond the visible pixels and requires complex reasoning about full scene structure is proposed, and it is shown that the proposed full scene annotation is surprisingly consistent between annotators, including for regions and edges. Expand
Semantic Scene Completion from a Single Depth Image
TLDR
The semantic scene completion network (SSCNet) is introduced, an end-to-end 3D convolutional network that takes a single depth image as input and simultaneously outputs occupancy and semantic labels for all voxels in the camera view frustum. Expand
Context Encoders: Feature Learning by Inpainting
TLDR
It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods. Expand
Auto-Encoding Variational Bayes
TLDR
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced. Expand
Variational Amodal Object Completion
TLDR
A variational generative framework for amodal completion, referred to as Amodal-VAE, which is able to utilize widely available object instance masks and shows that humans prefer object completions inferred by the model to the human-labeled ones. Expand
...
1
2
3
4
5
...