Pyramid Scene Parsing Network
@article{Zhao2016PyramidSP, title={Pyramid Scene Parsing Network}, author={Hengshuang Zhao and Jianping Shi and Xiaojuan Qi and Xiaogang Wang and Jiaya Jia}, journal={2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2016}, pages={6230-6239} }
Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. [] Key Result A single PSPNet yields the new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.
Figures and Tables from this paper
7,511 Citations
Multi-layer Feature Aggregation for Deep Scene Parsing Models
- Computer ScienceArXiv
- 2020
The effective use of multi-layer feature outputs of the deep parsing networks for spatial-semantic consistency is explored by designing a novel feature aggregation module to generate the appropriate global representation prior, to improve the discriminative power of features.
Attention Pyramid Module for Scene Recognition
- Computer Science2020 25th International Conference on Pattern Recognition (ICPR)
- 2021
This paper streamlines the multi-scale scene recognition pipeline, learns comprehensive scene features at various scales and locations, addresses the interdependency among scales, and further assists feature re-calibration as well as the aggregation process using the Attention Pyramid Module.
Semantic combined network for zero-shot scene parsing
- Computer ScienceIET Image Process.
- 2020
A novel framework called semantic combined network (SCN), which aims at learning a scene parsing model only from the images of the seen classes while targeting on the unseen ones, and can perform well on both zero-shot scene parsing (ZSSP) and generalised ZSSP settings based on several state-of-the-art scenes parsing architectures.
SPNet: Superpixel Pyramid Network for Scene Parsing
- Computer Science2018 Chinese Automation Congress (CAC)
- 2018
Extensive experimental results over ADE20K, PASCAL VOC 2012, and Camvid, demonstrated that the proposed Superpixel Pyramid Network can obtain better performance counterparts than other.
Pyramid Attention Network for Semantic Segmentation
- Computer ScienceBMVC
- 2018
This work introduces a Feature Pyramid Attention module to perform spatial pyramid attention structure on high-level output and combining global pooling to learn a better feature representation, and a Global Attention Upsample module on each decoder layer to provide global context as a guidance of low-level features to select category localization details.
Adaptive Context Network for Scene Parsing
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
This paper proposes an Adaptive Context Network (ACNet) to capture the pixel-aware contexts by a competitive fusion of global context and local context according to different per-pixel demands.
PCANet: Pyramid convolutional attention network for semantic segmentation
- Computer ScienceImage Vis. Comput.
- 2020
Scene Parsing via Tree Structure Enhancement Lightweight Network
- Computer Science2022 International Conference on Intelligent Education and Intelligent Research (IEIR)
- 2022
A framework named tree structure enhancement lightweight network (TSELight) is proposed, which introduces the depth-wise separable dilated convolution (DSDC) into the tree structure and decomposes the middle nodes in the treeructure along the channel direction, thus improving the efficiency.
Recurrent Scene Parsing with Perspective Understanding in the Loop
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This work proposes a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale so that small details are preserved for distant objects while larger receptive fields are used for those nearby.
Learning deep representations for semantic image parsing: a comprehensive overview
- Computer ScienceFrontiers of Computer Science
- 2018
Three aspects of the progress of research on semantic image parsing are summarized, i.e., category-level semantic segmentation, instance-levelSemantic image parsing, and beyond segmentation.
References
SHOWING 1-10 OF 47 REFERENCES
Nonparametric scene parsing: Label transfer via dense scene alignment
- Computer Science2009 IEEE Conference on Computer Vision and Pattern Recognition
- 2009
Compared to existing object recognition approaches that require training for each object category, the proposed nonparametric scene parsing system is easy to implement, has few parameters, and embeds contextual information naturally in the retrieval/alignment procedure.
Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation
- Computer ScienceECCV
- 2016
A multi-resolution reconstruction architecture based on a Laplacian pyramid that uses skip connections from higher resolution feature maps and multiplicative gating to successively refine segment boundaries reconstructed from lower-resolution maps is described.
Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work shows how to improve semantic segmentation through the use of contextual information, specifically, ' patch-patch' context between image regions, and 'patch-background' context, and formulate Conditional Random Fields with CNN-based pairwise potential functions to capture semantic correlations between neighboring patches.
Semantic Understanding of Scenes Through the ADE20K Dataset
- Computer ScienceInternational Journal of Computer Vision
- 2018
This work presents a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts, and shows that the networks trained on this dataset are able to segment a wide variety of scenes and objects.
Semantic Image Segmentation via Deep Parsing Network
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
This paper addresses semantic image segmentation by incorporating rich information into Markov Random Field (MRF), including high-order relations and mixture of label contexts by proposing a Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which enables deterministic end-to-end computation in a single forward pass.
Convolutional Scale Invariance for Semantic Segmentation
- Computer ScienceGCPR
- 2016
A novel scale selection layer which extracts convolutional features at the scale which matches the corresponding reconstructed depth and frees the pixel-level classifier from the need to learn the laws of the perspective results in improved segmentation results.
Learning Deconvolution Network for Semantic Segmentation
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
A novel semantic segmentation algorithm by learning a deep deconvolution network on top of the convolutional layers adopted from VGG 16-layer net, which demonstrates outstanding performance in PASCAL VOC 2012 dataset.
Feedforward semantic segmentation with zoom-out features
- Computer ScienceCVPR
- 2015
This work introduces a purely feed-forward architecture for semantic segmentation that exploits statistical structure in the image and in the label space without setting up explicit structured prediction mechanisms, and thus avoids complex and expensive inference.
Fully convolutional networks for semantic segmentation
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
The Cityscapes Dataset for Semantic Urban Scene Understanding
- Computer Science, Environmental Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.