Similarity-Aware Fusion Network for 3D Semantic Segmentation

@article{Zhao2021SimilarityAwareFN,
  title={Similarity-Aware Fusion Network for 3D Semantic Segmentation},
  author={Linqing Zhao and Jiwen Lu and Jie Zhou},
  journal={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2021},
  pages={1585-1592}
}
  • Linqing Zhao, Jiwen Lu, Jie Zhou
  • Published 4 July 2021
  • Computer Science
  • 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
In this paper, we propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation. Existing fusion-based methods achieve superior performances by integrating information from multiple modalities. However, they heavily rely on the projection-based correspondence between 2D pixels and 3D points and can only perform the information fusion in a fixed manner, so that their performances cannot be easily migrated to a more realistic… 
1 Citations

Figures and Tables from this paper

Learning Hybrid Semantic Affinity for Point Cloud Segmentation
TLDR
This paper presents a hybrid semantic affinity learning method (HSA) to capture and leverage the dependencies of categories for 3D semantic segmentation and proposes the concept of local affinity to effectively model the intra-class and inter-class semantic similarities for adjacent neighborhoods.

References

SHOWING 1-10 OF 51 REFERENCES
Fusion-Aware Point Convolution for Online Semantic 3D Scene Segmentation
TLDR
A novel fusion-aware 3D point convolution which operates directly on the geometric surface being reconstructed and exploits effectively the inter-frame correlation for high-quality 3D feature learning is proposed.
Dense 3D semantic mapping of indoor scenes from RGB-D images
TLDR
A novel 2D-3D label transfer based on Bayesian updates and dense pairwise 3D Conditional Random Fields and it is shown that it is not needed to obtain a semantic segmentation for every frame in a sequence in order to create accurate semantic 3D reconstructions.
Dense 3 D Semantic Mapping of Indoor Scenes from RGB-D Images
TLDR
A novel 2D-3D label transfer based on Bayesian updates and dense pairwise 3D Conditional Random Fields and it is shown that it is not needed to obtain a semantic segmentation for every frame in a sequence in order to create accurate semantic 3D reconstructions.
Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations
TLDR
This paper enrichs each point represen-tation by performing one novel gated fusion on the point itself and its contextual point representations, and proposes one novel graph pointnet module, relying on the graph attention block to dynamically com-pose and update each point representation within the local point cloud structure.
A Unified Point-Based Framework for 3D Segmentation
TLDR
A new unified point-based framework for 3D point cloud segmentation that effectively optimizes pixel-level features, geometrical structures and global context priors of an entire scene is presented and outperforms several state-of-theart approaches.
SemanticFusion: Dense 3D semantic mapping with convolutional neural networks
TLDR
This work combines Convolutional Neural Networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondences between frames of indoor RGB-D video even during loopy scanning trajectories, and produces a useful semantic 3D map.
Pix 2 Vox : Context-aware 3 D Reconstruction from Single and Multiview Images
TLDR
A novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox, which outperforms state-ofthe-arts by a large margin and is 24 times faster than 3D-R2N2 in terms of backward inference time.
Dual Attention Network for Scene Segmentation
TLDR
New state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset is achieved without using coarse data.
DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
TLDR
DenseFusion is a generic framework for estimating 6D pose of a set of known objects from RGB-D images that processes the two data sources individually and uses a novel dense fusion network to extract pixel-wise dense feature embedding, from which the pose is estimated.
...
...