Graph-Based Global Reasoning Networks

  title={Graph-Based Global Reasoning Networks},
  author={Yunpeng Chen and Marcus Rohrbach and Zhicheng Yan and Shuicheng Yan and Jiashi Feng and Yannis Kalantidis},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
Globally modeling and reasoning over relations between regions can be beneficial for many computer vision tasks on both images and videos. Convolutional Neural Networks (CNNs) excel at modeling local relations by convolution operations, but they are typically inefficient at capturing global relations between distant regions and require stacking multiple convolution layers. In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the… 

Figures and Tables from this paper

Global Relation Reasoning Graph Convolutional Networks for Human Pose Estimation

Experiments show that GRR-GCN can boost the performance of state-of-the-art human pose estimation networks including SimpleBaseline and HRNet (High-Resolution Net).

Region-Based Global Reasoning Networks

This paper designs a region aggregation method that can gather regional features automatically into a uniform shape, and adjust theirs positions adaptively for better alignment, and proposes various relationship exploration methods and applies them on the regional features.

GINet: Graph Interaction Network for Scene Parsing

This work explores how to incorporate the linguistic knowledge to promote context reasoning over image regions by proposing a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss).

Dynamic Regions Graph Neural Networks for Spatio-Temporal Reasoning

This work is focusing on modeling relations between instances by proposing a method that takes advantage of the locality assumption to create nodes that are clearly localised in space, and achieves superior results on video classification tasks involving instance interactions.

Spatial Pyramid Based Graph Reasoning for Semantic Segmentation

This paper applies graph convolution into the semantic segmentation task and proposes an improved Laplacian, which gets rid of projecting and re-projecting processes and makes spatial pyramid possible to explore multiple long-range contextual patterns from different scales.

Towards Efficient Scene Understanding via Squeeze Reasoning

This paper explores the efficiency of context graph reasoning and proposes a novel framework called Squeeze Reasoning, which learns to squeeze the input feature into a channel-wise global vector and perform reasoning within the single vector where the computation cost can be significantly reduced.

Graph Reasoning Transformer for Image Parsing

A novel Graph Reasoning Transformer for image parsing to enable image patches to interact following a relation reasoning pattern and results show that GReaT achieves consistent performance gains with slight computational overheads on the state-of-the-art transformer baselines.

Visual Concept Reasoning Networks

This work proposes a split-transform-attend-interact-modulate-merge model, which is implemented by opting for a highly modularized architecture and consistently improves the performance by increasing the number of parameters by less than 1%.

Exploit Visual Dependency Relations for Semantic Segmentation

A novel network architecture, termed the dependency network or DependencyNet, for semantic segmentation, which unifies dependency reasoning at three semantic levels and experimental results on two benchmark datasets show the Dependency net achieves comparable performance to the recent states of the art.

Unified Graph Structured Models for Video Understanding

This paper proposes a message passing graph neural network that explicitly models these spatio-temporal relations and can use explicit representations of objects, when supervision is available, and implicit representations otherwise, and generalises previous structured models for video understanding.



Videos as Space-Time Region Graphs

The proposed graph representation achieves state-of-the-art results on the Charades and Something-Something datasets and obtains a huge gain when the model is applied in complex environments.

A simple neural network module for relational reasoning

This work shows how a deep learning architecture equipped with an RN module can implicitly discover and learn to reason about entities and their relations.

Beyond Grids: Learning Graph Representations for Visual Recognition

This work draws inspiration from region based recognition, and learns to transform a 2D image into a graph structure, which facilitates reasoning beyond regular grids and can capture long range dependencies among regions.

A2-Nets: Double Attention Networks

This work proposes the "double attention block", a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access featuresFrom the entire space efficiently.

Convolutional Random Walk Networks for Semantic Image Segmentation

This work introduces a simple, yet effective Convolutional Random Walk Network (RWN) that addresses the issues of poor boundary localization and spatially fragmented predictions with very little increase in model complexity.

Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning

It is shown that the graph convolution of the GCN model is actually a special form of Laplacian smoothing, which is the key reason why GCNs work, but it also brings potential concerns of over-smoothing with many convolutional layers.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.

Dense and Low-Rank Gaussian CRFs Using Deep Embeddings

This work introduces a structured prediction model that endows the Deep Gaussian Conditional Random Field with a densely connected graph structure, and shows that the learned embeddings capture pixel-to-pixel affinities in a task-specific manner.

Densely Connected Convolutional Networks

The Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion, and has several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.

PSANet: Point-wise Spatial Attention Network for Scene Parsing

The point-wise spatial attention network (PSANet) is proposed to relax the local neighborhood constraint and achieves top performance on various competitive scene parsing datasets, including ADE20K, PASCAL VOC 2012 and Cityscapes, demonstrating its effectiveness and generality.