Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer

  title={Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer},
  author={Liang Lin and Yiming Gao and Ke Gong and Meng Wang and Xiaodan Liang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
Prior highly-tuned image parsing models are usually studied in a certain domain with a specific set of semantic labels and can hardly be adapted into other scenarios (e.g.sharing discrepant label granularity) without extensive re-training. Learning a single universal parsing model by unifying label annotations from different domains or at various levels of granularity is a crucial but rarely addressed topic. This poses many fundamental learning challenges, e.g.discovering underlying semantic… 
FLOAT: Factorized Learning of Object Attributes for Improved Multi-object Multi-part Scene Parsing
The FLOAT framework involves independent dense prediction of object category and part attributes which increases scalability and reduces task complexity compared to the monolithic label space counterpart, and proposes an inference-time ‘zoom’ refinement technique which significantly improves segmentation quality, especially for smaller objects/parts.


Semantic Object Parsing with Graph LSTM
The Graph Long Short-Term Memory network is proposed, which is the generalization of LSTM from sequential data or multi-dimensional data to general graph-structured data.
Dynamic-Structured Semantic Propagation Network
A Dynamic-Structured Semantic Propagation Network (DSSPN) is proposed that builds a semantic neuron graph by explicitly incorporating the semantic concept hierarchy into network construction and demonstrates the superiority of the DSSPN over state-of-the-art segmentation models.
Instance-level Human Parsing via Part Grouping Network
This work makes the first attempt to explore a detection-free Part Grouping Network (PGN) for efficiently parsing multiple people in an image in a single pass and outperforms all state-of-the-art methods on PASCAL-Person-Part dataset.
Multi-label Zero-Shot Learning with Structured Knowledge Graphs
A novel deep learning architecture for multi-label zero-shot learning (ML-ZSL), which is able to predict multiple unseen class labels for each input instance, and a framework that incorporates knowledge graphs for describing the relationships between multiple labels is proposed.
Instance-Aware Semantic Segmentation via Multi-task Network Cascades
  • Jifeng Dai, Kaiming He, Jian Sun
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This paper presents Multitask Network Cascades for instance-aware semantic segmentation, which consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects, and develops an algorithm for the nontrivial end-to-end training of this causal, cascaded structure.
Semantic Object Parsing with Local-Global Long Short-Term Memory
A novel deep Local-Global Long Short-Term Memory architecture to seamlessly incorporate short-distance and long-distance spatial dependencies into the feature learning over all pixel positions and demonstrates the significant superiority of this LG-LSTM over other state-of-the-art methods.
Iterative Visual Reasoning Beyond Convolutions
Analysis shows that the framework is resilient to missing regions for reasoning and shows strong performance over plain ConvNets, e.g. achieving an 8.4% absolute improvement on ADE measured by per-class average precision.
Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs
  • X. Wang, Yufei Ye, A. Gupta
  • Computer Science
    2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
This paper builds upon the recently introduced Graph Convolutional Network (GCN) and proposes an approach that uses both semantic embeddings and the categorical relationships to predict the classifiers, and shows that it is robust to noise in the KG.
Graph Attention Networks
We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.