Pyramid Scene Parsing Network

  title={Pyramid Scene Parsing Network},
  author={Hengshuang Zhao and Jianping Shi and Xiaojuan Qi and Xiaogang Wang and Jiaya Jia},
  journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. [] Key Result A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.

Multi-layer Feature Aggregation for Deep Scene Parsing Models

The effective use of multi-layer feature outputs of the deep parsing networks for spatial-semantic consistency is explored by designing a novel feature aggregation module to generate the appropriate global representation prior, to improve the discriminative power of features.

Attention Pyramid Module for Scene Recognition

This paper streamlines the multi-scale scene recognition pipeline, learns comprehensive scene features at various scales and locations, addresses the interdependency among scales, and further assists feature re-calibration as well as the aggregation process using the Attention Pyramid Module.

Semantic combined network for zero-shot scene parsing

A novel framework called semantic combined network (SCN), which aims at learning a scene parsing model only from the images of the seen classes while targeting on the unseen ones, and can perform well on both zero-shot scene parsing (ZSSP) and generalised ZSSP settings based on several state-of-the-art scenes parsing architectures.

SPNet: Superpixel Pyramid Network for Scene Parsing

Extensive experimental results over ADE20K, PASCAL VOC 2012, and Camvid, demonstrated that the proposed Superpixel Pyramid Network can obtain better performance counterparts than other.

Pyramid Attention Network for Semantic Segmentation

This work introduces a Feature Pyramid Attention module to perform spatial pyramid attention structure on high-level output and combining global pooling to learn a better feature representation, and a Global Attention Upsample module on each decoder layer to provide global context as a guidance of low-level features to select category localization details.

Adaptive Context Network for Scene Parsing

  • J. FuJing Liu Hanqing Lu
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
This paper proposes an Adaptive Context Network (ACNet) to capture the pixel-aware contexts by a competitive fusion of global context and local context according to different per-pixel demands.

Scene Parsing via Tree Structure Enhancement Lightweight Network

A framework named tree structure enhancement lightweight network (TSELight) is proposed, which introduces the depth-wise separable dilated convolution (DSDC) into the tree structure and decomposes the middle nodes in the treeructure along the channel direction, thus improving the efficiency.

Recurrent Scene Parsing with Perspective Understanding in the Loop

This work proposes a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale so that small details are preserved for distant objects while larger receptive fields are used for those nearby.

Learning deep representations for semantic image parsing: a comprehensive overview

Three aspects of the progress of research on semantic image parsing are summarized, i.e., category-level semantic segmentation, instance-levelSemantic image parsing, and beyond segmentation.



Nonparametric scene parsing: Label transfer via dense scene alignment

Compared to existing object recognition approaches that require training for each object category, the proposed nonparametric scene parsing system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.

Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

A multi-resolution reconstruction architecture based on a Laplacian pyramid that uses skip connections from higher resolution feature maps and multiplicative gating to successively refine segment boundaries reconstructed from lower-resolution maps is described.

Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation

This work shows how to improve semantic segmentation through the use of contextual information, specifically, ' patch-patch' context between image regions, and 'patch-background' context, and formulate Conditional Random Fields with CNN-based pairwise potential functions to capture semantic correlations between neighboring patches.

Semantic Understanding of Scenes Through the ADE20K Dataset

This work presents a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, and shows that the networks trained on this dataset are able to segment a wide variety of scenes and objects.

Semantic Image Segmentation via Deep Parsing Network

This paper addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass.

Convolutional Scale Invariance for Semantic Segmentation

A novel scale selection layer which extracts convolutional features at the scale which matches the corresponding reconstructed depth and frees the pixel-level classifier from the need to learn the laws of the perspective results in improved segmentation results.

Learning Deconvolution Network for Semantic Segmentation

A novel semantic segmentation algorithm by learning a deep deconvolution network on top of the convolutional layers adopted from VGG 16-layer net, which demonstrates outstanding performance in PASCAL VOC 2012 dataset.

Feedforward semantic segmentation with zoom-out features

This work introduces a purely feed-forward architecture for semantic segmentation that exploits statistical structure in the image and in the label space without setting up explicit structured prediction mechanisms, and thus avoids complex and expensive inference.

Fully convolutional networks for semantic segmentation

The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

The Cityscapes Dataset for Semantic Urban Scene Understanding

This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.