Corpus ID: 237532692

M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection

  title={M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection},
  author={Xian Fang and Jinchao Zhu and Ruixun Zhang and Xiuli Shao and Hongpeng Wang},
  • Xian Fang, Jinchao Zhu, +2 authors Hongpeng Wang
  • Published 16 September 2021
  • Computer Science
  • ArXiv
Salient object detection is a fundamental topic in computer vision. Previous methods based on RGB-D often suffer from the incompatibility of multi-modal feature fusion and the insufficiency of multi-scale feature aggregation. To tackle these two dilemmas, we propose a novel multi-modal and multi-scale refined network (M2RNet). Three essential components are presented in this network. The nested dual attention module (NDAM) explicitly exploits the combined features of RGB and depth flows. The… 

Figures and Tables from this paper


BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network
This paper proposes a bifurcated backbone strategy (BBS) to split the multi-level features of RGB-D salient object detection into teacher and student features, and utilizes a depth-enhanced module (DEM) to excavate informative parts of depth cues from the channel and spatial views.
Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection
A hybrid enhanced loss function is designed to make the predictions have sharper edges and consistent saliency regions, and implement a kind of more flexible and efficient multi-scale cross-modal feature processing, i.e. dynamic dilated pyramid module.
Three-Stream Attention-Aware Network for RGB-D Salient Object Detection
  • H. Chen, Youfu Li
  • Computer Science, Medicine
    IEEE Transactions on Image Processing
  • 2019
In the proposed architecture, a cross-modal distillation stream, accompanying the RGB-specific and depth-specific streams, is introduced to extract new RGB-D features in each level in the bottom–up path, and the channel-wise attention mechanism is innovatively introduced to the cross- modal cross-level fusion problem to adaptively select complementary feature maps from each modality in eachlevel.
Multi-Scale Interactive Network for Salient Object Detection
The consistency-enhanced loss is exploited to highlight the fore-/back-ground difference and preserve the intra-class consistency in the aggregate interaction modules to integrate the features from adjacent levels, in which less noise is introduced because of only using small up-/down-sampling rates.
ICNet: Information Conversion Network for RGB-D Based Salient Object Detection
A novel Information Conversion Network (ICNet) is proposed for RGB-D based SOD by employing the siamese structure with encoder-decoder architecture, which contains concatenation operations and correlation layers, and a Cross-modal Depth-weighted Combination block to discriminate the cross- modal features from different sources and to enhance RGB features with depth features at each level.
Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection
A novel multi-scale multi-path fusion network with cross-modal interactions (MMCI), in which the traditional two-stream fusion architecture with single fusion path is advanced by diversifying the fusion path to a global reasoning one and another local capturing one and meanwhile introducing cross- modal interactions in multiple layers.
Cross-Modal Weighting Network for RGB-D Salient Object Detection
This paper proposes a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD, and designs a composite loss function that summarizes the errors between intermediate predictions and ground truth over different scales.
A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection
A single stream network to directly use the depth map to guide early fusion and middle fusion between RGB and depth, which saves the feature encoder of the depth stream and achieves a lightweight and real-time model.
JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection
The JL module provides robust saliency feature learning, while the latter is introduced for complementary feature discovery, and the designed framework yields a robust RGB-D saliency detector with good generalization.
Contrast Prior and Fluid Pyramid Integration for RGBD Salient Object Detection
Contrast prior is utilized, which used to be a dominant cue in none deep learning based SOD approaches, into CNNs-based architecture to enhance the depth information and is integrated with RGB features for SOD, using a novel fluid pyramid integration.