PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things

@article{Narita2019PanopticFusionOV,
  title={PanopticFusion: Online Volumetric Semantic Mapping at the Level of Stuff and Things},
  author={Gaku Narita and Takashi Seno and Tomoya Ishikawa and Yohsuke Kaji},
  journal={2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2019},
  pages={4205-4212}
}
We propose PanopticFusion, a novel online volumetric semantic mapping system at the level of stuff and things. In contrast to previous semantic mapping systems, PanopticFusion is able to densely predict class labels of a background region (stuff) and individually segment arbitrary foreground objects (things). In addition, our system has the capability to reconstruct a large-scale scene and extract a labeled mesh thanks to its use of a spatially hashed volumetric map representation. Our system… 

Figures and Tables from this paper

Volumetric Instance-Level Semantic Mapping Via Multi-View 2D-to-3D Label Diffusion
TLDR
This work presents a novel approach to progressively build instance-level, dense 3D maps from color and depth cues acquired by either a moving RGB-D sensor or a camera-LiDAR setup, whose pose is being tracked.
Panoptic Neural Fields: A Semantic Object-Aware Neural Scene Representation
TLDR
Panoptic Neural Fields is presented, an object-aware neural scene representation that decomposes a scene into a set of objects (things) and background (stuff) that can be smaller and faster than previousobject-aware approaches, while still leveraging category-specific priors incorporated via meta-learned initialization.
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
TLDR
This work proposes Scan2Cap, an end-to-end trained method to detect objects in the input scene and describe them in natural language, which can effectively localize and describe 3D objects in scenes from the ScanRefer dataset, outperforming 2D baseline methods by a significant margin.
A Benchmark for LiDAR-based Panoptic Segmentation based on KITTI
TLDR
This paper presents an extension of SemanticKITTI, a large-scale dataset providing dense point-wise semantic labels for all sequences of the KITTI Odometry Benchmark, and presents two strong baselines that combine state- of-the-art LiDAR-based semantic segmentation approaches with a state-of- the-art detector enriching the segmentation with instance information.
Panoptic Multi-TSDFs: a Flexible Representation for Online Multi-resolution Volumetric Mapping and Long-term Dynamic Scene Consistency
TLDR
This work proposes panoptic multi-TSDFs, a novel representation for multiresolution volumetric mapping over long periods of time, which enables to maintain up-to-date reconstructions with high accuracy while improving coverage by incorporating and fusing previous data.
Semantic Dense Reconstruction with Consistent Scene Segments
TLDR
A novel semantic projection block (SP-Block) is proposed to extract deep feature volumes from 2D segments of different views and is fused into deep volumes from a point cloud encoder to make the final semantic segmentation.
Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation
TLDR
A novel fusion-aware 3D point convolution which operates directly on the geometric surface being reconstructed and exploits effectively the inter-frame correlation for high-quality 3D feature learning is proposed.
Dynamic Convolution for 3D Point Cloud Instance Segmentation
TLDR
An approach to instance segmentation from 3D point clouds based on dynamic convolution that enables it to adapt, at inference, to varying feature and object scales, and yields strong performance on various datasets: ScanNetV2, S3DIS, and PartNet.
Global Context Reasoning for Semantic Segmentation of 3D Point Clouds
TLDR
Experimental results show that the proposed PointGCR module efficiently captures global contextual dependencies and significantly improve the segmentation performance of several existing networks.
Supervoxel Convolution for Online 3D Semantic Segmentation
TLDR
The extensive evaluations on the public 3D indoor scene datasets show that the proposed Supervoxel-CNN approach significantly outperforms the existing online semantic segmentation systems in terms of efficiency or accuracy.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
SemanticFusion: Dense 3D semantic mapping with convolutional neural networks
TLDR
This work combines Convolutional Neural Networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondences between frames of indoor RGB-D video even during loopy scanning trajectories, and produces a useful semantic 3D map.
Dense 3D semantic mapping of indoor scenes from RGB-D images
TLDR
A novel 2D-3D label transfer based on Bayesian updates and dense pairwise 3D Conditional Random Fields and it is shown that it is not needed to obtain a semantic segmentation for every frame in a sequence in order to create accurate semantic 3D reconstructions.
MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects
  • Martin Rünz, L. Agapito
  • Computer Science
    2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)
  • 2018
TLDR
This work presents MaskFusion, a real-time, object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems which output a purely geometric map of a static scene, and takes full advantage of using instance-level semantic segmentation to enable semantic labels to be fused into an object- aware map.
Panoptic Segmentation
TLDR
A novel panoptic quality (PQ) metric is proposed that captures performance for all classes (stuff and things) in an interpretable and unified manner and is performed a rigorous study of both human and machine performance for PS on three existing datasets, revealing interesting insights about the task.
3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans
TLDR
3D-SIS is introduced, a novel neural network architecture for 3D semantic instance segmentation in commodity RGB-D scans that leverages high-resolution RGB input by associating 2D images with the volumetric grid based on the pose alignment of the 3D reconstruction.
Semantic Understanding of Scenes Through the ADE20K Dataset
TLDR
This work presents a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, and shows that the networks trained on this dataset are able to segment a wide variety of scenes and objects.
Semantic 3D occupancy mapping through efficient high order CRFs
TLDR
An incremental and (near) real-time semantic mapping system that utilizes the CNN segmentation as prior prediction and further optimize 3D grid labels through a novel CRF model to represent the world.
3DMV: Joint 3D-Multi-View Prediction for 3D Semantic Scene Segmentation
TLDR
3DMV is presented, a novel method for 3D semantic scene segmentation of RGB-D scans in indoor environments using a joint 3D-multi-view prediction network that achieves significantly better results than existing baselines.
ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes
TLDR
This work introduces ScanNet, an RGB-D video dataset containing 2.5M views in 1513 scenes annotated with 3D camera poses, surface reconstructions, and semantic segmentations, and shows that using this data helps achieve state-of-the-art performance on several 3D scene understanding tasks.
Efficient Object-Oriented Semantic Mapping With Object Detector
TLDR
A novel object- oriented semantic mapping approach aiming at overcoming real-time processing issues by introducing highly accurate object-oriented semantic scene reconstruction in real time, and complementarily improves the geometric-based segmentation results beyond the geometric only to the semantic-aware representation.
...
1
2
3
4
...