Scale-Adaptive Convolutions for Scene Parsing

  title={Scale-Adaptive Convolutions for Scene Parsing},
  author={Rui Zhang and Sheng Tang and Yongdong Zhang and Jintao Li and Shuicheng Yan},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
Many existing scene parsing methods adopt Convolutional Neural Networks with fixed-size receptive fields, which frequently result in inconsistent predictions of large objects and invisibility of small objects. [] Key Method Through adding a new scale regression layer, we can dynamically infer the position-adaptive scale coefficients which are adopted to resize the convolutional patches.

Figures and Tables from this paper

Perspective-Adaptive Convolutions for Scene Parsing

This work proposes perspective-adaptive convolutions to acquire receptive fields of flexible sizes and shapes during scene parsing through adding a new perspective regression layer, which can dynamically infer the position- Adaptive perspective coefficient vectors utilized to reshape the convolutional patches.

Scale-Aware Segmentation of Multiple-Scale Objects in Aerial Images

A light-weight scale-aware module (SWD), which is end-to-end differentiable and can be embedded in most existing networks, and explicitly adjusts the receptive field size to fit different scales of target objects to alleviate the problem of fixed square receptive field problems.

Multi-layer Feature Aggregation for Deep Scene Parsing Models

The effective use of multi-layer feature outputs of the deep parsing networks for spatial-semantic consistency is explored by designing a novel feature aggregation module to generate the appropriate global representation prior, to improve the discriminative power of features.

Rethinking Atrous Convolution for Semantic Image Segmentation

The proposed `DeepLabv3' system significantly improves over the previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

Malleable 2.5D Convolution: Learning Receptive Fields along the Depth-axis for RGB-D Scene Parsing

This paper proposes a novel operator called malleable 2.5D convolution to learn the receptive field along the depth-axis, formulated as a differentiable form so that it can be learnt by gradient descent.

Scene Parsing Via Dense Recurrent Neural Networks With Attentional Selection

This paper proposes dense RNNs for scene labeling by exploring various long-range semantic dependencies among image units and develops an end-to-end scene labeling system integrating with convolutional neural networks (CNNs).

Inception Convolution with Efficient Dilation Search

  • Jie LiuChuming Li Dong Xu
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
A new type of dilated convolution is proposed, where the convolution operations have independent dilation patterns among different axes, channels and layers, and this method achieves consistent performance gains for image recognition, object detection, instance segmentation, human detection, and human pose estimation.

Deep Multiphase Level Set for Scene Parsing

This paper proposes a novel Deep Multiphase Level Set (DMLS) method for semantic scene parsing, which efficiently incorporates multiphase level sets into deep neural networks.

Dynamic Multi-Scale Filters for Semantic Segmentation

A Dynamic Multi-scale Network (DMNet) is proposed to adaptively capture multi-scale contents for predicting pixel-level semantic labels and obtains state-of-the-art performance on Pascal-Context and ADE20K.



DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.

Understanding Convolution for Semantic Segmentation

DUC is designed to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling, and a hybrid dilated convolution (HDC) framework in the encoding phase is proposed.

Attention to Scale: Scale-Aware Semantic Image Segmentation

An attention mechanism that learns to softly weight the multi-scale features at each pixel location is proposed, which not only outperforms averageand max-pooling, but allows us to diagnostically visualize the importance of features at different positions and scales.

Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation

This work shows how to improve semantic segmentation through the use of contextual information, specifically, ' patch-patch' context between image regions, and 'patch-background' context, and formulate Conditional Random Fields with CNN-based pairwise potential functions to capture semantic correlations between neighboring patches.

Multi-Scale Context Aggregation by Dilated Convolutions

This work develops a new convolutional network module that is specifically designed for dense prediction, and shows that the presented context module increases the accuracy of state-of-the-art semantic segmentation systems.

Semantic Image Segmentation via Deep Parsing Network

This paper addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass.

Fully convolutional networks for semantic segmentation

The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

Multi-Path Feedback Recurrent Neural Networks for Scene Parsing

This paper considers the scene parsing problem and proposes a novel Multi-Path Feedback recurrent neural network (MPF-RNN), which can enhance the capability of RNNs in modeling long-range context information at multiple levels and better distinguish pixels that are easy to confuse.

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

Global-residual and Local-boundary Refinement Networks for Rectifying Scene Parsing Predictions

This work proposes a Global-residual Refinement Network (GRN) through exploiting global contextual information to predict the parsing residuals and iteratively smoothen the inconsistent parsing labels, and proposes a Localboundary Refinement network (LRN) to learn the position-adaptive propagation coefficients so that local contextual information from neighbors can be optimally captured for refining object boundaries.