Panoptic 3D Scene Reconstruction From a Single RGB Image
@article{Dahnert2021Panoptic3S, title={Panoptic 3D Scene Reconstruction From a Single RGB Image}, author={Manuel Dahnert and Ji Hou and Matthias Nie{\ss}ner and Angela Dai}, journal={ArXiv}, year={2021}, volume={abs/2111.02444} }
Understanding 3D scenes from a single image is fundamental to a wide variety of tasks, such as for robotics, motion planning, or augmented reality. Existing works in 3D perception from a single RGB image tend to focus on geometric reconstruction only, or geometric reconstruction with semantic segmentation or instance segmentation. Inspired by 2D panoptic segmentation, we propose to unify the tasks of geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into the task…
Figures and Tables from this paper
18 Citations
Learning 3D Scene Priors with 2D Supervision
- Computer ScienceArXiv
- 2022
This work proposes a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth, and achieves state-of-the-art results in scene synthesis against baselines which require for 3D supervision.
Panoptic Lifting for 3D Scene Understanding with Neural Fields
- Computer ScienceArXiv
- 2022
We propose Panoptic Lifting, a novel approach for learning panoptic 3D volumetric representations from images of in-the-wild scenes. Once trained, our model can render color images together with…
SceneRF: Self-Supervised Monocular 3D Scene Reconstruction with Radiance Fields
- Computer ScienceArXiv
- 2022
SceneRF, a self-supervised monocular scene reconstruction method with neural radiance fields (NeRF) learned from multiple image sequences with pose is proposed, and new geometry constraints and a novel probabilistic sampling strategy are introduced to improve geometry prediction.
MonoScene: Monocular 3D Semantic Scene Completion
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
Experiments show the MonoScene framework outperform the literature on all metries and datasets while hallucinating plausible scenery even beyond the camera field of view, and introduces a 3D context relation prior to enforce spatio-semantic consistency.
Neural RGB-D Surface Reconstruction
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
This work proposes to represent the surface using an implicit function (truncated signed distance function), and shows how to incorporate this representation in the NeRF framework, and extend it to use depth measurements from a commodity RGB-D sensor, such as a Kinect.
AutoRF: Learning 3D Object Radiance Fields from Single View Observations
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
It is shown that the AutoRF method generalizes well to unseen objects, even across different datasets of challenging real-world street scenes such as nuScenes, KITTI, and Mapillary Metropolis.
Joint stereo 3D object detection and implicit surface reconstruction
- Computer ScienceArXiv
- 2021
This approach features a new instance-level network that explicitly models the unseen surface hallucination problem using point-based representations and uses a new geometric representation for orientation refinement.
Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation
- Computer Science2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2022
Panoptic Neural Fields is presented, an object-aware neural scene representation that decomposes a scene into a set of objects (things) and background (stuff) that can be smaller and faster than previous object- aware approaches, while still leveraging category-specific priors incorporated via meta-learned initialization.
Neural rendering in a room
- Computer ScienceACM Trans. Graph.
- 2022
A novel solution to mimic such human perception capability based on a new paradigm of amodal 3D scene understanding with neural rendering for a closed scene by exploiting compositional neural rendering techniques for data augmentation in the offline training.
3D Multi-Object Tracking with Differentiable Pose Estimation
- Computer ScienceArXiv
- 2022
A graph-based, fully end-to-end-learnable approach for joint 3D multi-object tracking and reconstruction from RGB-D sequences in indoor environments that improves the accumulated MOTA score for all test sequences by 24.8% over existing state-of-the-art methods.
References
SHOWING 1-10 OF 44 REFERENCES
Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image
- Computer ScienceECCV
- 2018
A Holistic Scene Grammar (HSG) is introduced to represent the 3D scene structure, which characterizes a joint distribution over the functional and geometric space of indoor scenes, and significantly outperforms prior methods on 3D layout estimation, 3D object detection, and holistic scene understanding.
3D Scene Reconstruction With Multi-Layer Depth and Epipolar Transformers
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
To improve the accuracy of view-centered representations for complex scenes, this work introduces a novel "Epipolar Feature Transformer" that transfers convolutional network features from an input view to other virtual camera viewpoints, and thus better covers the 3D scene geometry.
3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
3D-SIS is introduced, a novel neural network architecture for 3D semantic instance segmentation in commodity RGB-D scans that leverages high-resolution RGB input by associating 2D images with the volumetric grid based on the pose alignment of the 3D reconstruction.
CoReNet: Coherent 3D scene reconstruction from a single RGB image
- Computer ScienceECCV
- 2020
The model is adapted to address the harder task of reconstructing multiple objects from a single image, producing a coherent reconstruction, where all objects live in a single consistent 3D coordinate frame relative to the camera and they do not intersect in 3D space.
3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation
- Computer ScienceECCV
- 2018
3DMV is presented, a novel method for 3D semantic scene segmentation of RGB-D scans in indoor environments using a joint 3D-multi-view prediction network that achieves significantly better results than existing baselines.
3D Scene Reconstruction from a Single Viewport
- Computer ScienceECCV
- 2020
A novel approach to infer volumetric reconstructions from a single viewport, based only on an RGB image and a reconstructed normal image, and introduces a novel loss shaping technique for 3D data that guides the learning process towards regions where free and occupied space are close to each other.
Semantic Scene Completion from a Single Depth Image
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
The semantic scene completion network (SSCNet) is introduced, an end-to-end 3D convolutional network that takes a single depth image as input and simultaneously outputs occupancy and semantic labels for all voxels in the camera view frustum.
Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve
- Computer ScienceECCV
- 2020
Mask2CAD is presented, which jointly detects objects in real-world images and for each detected object, optimizes for the most similar CAD model and its pose, and constructs a joint embedding space between the detected regions of an image corresponding to an object and 3D CAD models, enabling retrieval of CAD models for an input RGB image.
RevealNet: Seeing Behind Objects in RGB-D Scans
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
RevealNet is a new data-driven approach that jointly detects object instances and predicts their complete geometry, which enables a semantically meaningful decomposition of a scanned scene into individual, complete 3D objects, including hidden and unobserved object parts.
PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points
- Computer ScienceNeurIPS
- 2019
Detecting 3D objects from a single RGB image is intrinsically ambiguous, thus requiring appropriate prior knowledge and intermediate representations as constraints to reduce the uncertainties and…