Unifying Nonlocal Blocks for Neural Networks

  title={Unifying Nonlocal Blocks for Neural Networks},
  author={Lei Zhu and Qi She and Duo Li and Yanye Lu and Xuejing Kang and Jie Hu and Changhu Wang},
  journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
  • Lei ZhuQi She Changhu Wang
  • Published 5 August 2021
  • Computer Science
  • 2021 IEEE/CVF International Conference on Computer Vision (ICCV)
The nonlocal-based blocks are designed for capturing long-range spatial-temporal dependencies in computer vision tasks. Although having shown excellent performance, they still lack the mechanism to encode the rich, structured information among elements in an image or video. In this paper, to theoretically analyze the property of these nonlocal-based blocks, we provide a new perspective to interpret them, where we view them as a set of graph filters generated on a fully-connected graph… 

Figures and Tables from this paper

MLPT: Multilayer Perceptron based Tracking

This paper presents a simple yet effective Multilayer Perceptron-based Tracking (MLPT), including the global receptive field, which is the first baseline of MLP-based architecture for object tracking.

Attention-Augmented Memory Network for Image Multi-Label Classification

Experimental results on standard benchmarks, including MS-COCO 2014, PASCAL VOC 2007, and VG-500, demonstrate the effectiveness and superiority of AAMN model, which outperforms current state-of-the-art methods.

Segmentation and Measurement of Superalloy Microstructure Based on Improved Nonlocal Block

The microstructure of superalloy materials has a decisive impact on its service performance. When preparing the material and photographing the microstructure, different depths of metallography

Classifying Facial Regions for Face Hallucination

Experimental results show that FRCN can remarkably improve face reconstruction's performance, and first divides the input low-resolution facial image into several patch blocks, then classifies them into three categories according to their reconstruction difficulty, thereby recovering high-quality high-resolution (HR) facial image.

CoF-Net: A Progressive Coarse-to-Fine Framework for Object Detection in Remote-Sensing Imagery

  • Cong ZhangK. LamQi Wang
  • Environmental Science, Computer Science
    IEEE Transactions on Geoscience and Remote Sensing
  • 2023
A novel coarse-to-fine framework (CoF-Net) is proposed for object detection in remote-sensing imagery that smoothly refines the original coarse features into multispectral nonlocal fine features with discriminative spatial–spectral details and semantic relations.

Bagging Regional Classification Activation Maps for Weakly Supervised Object Localization

This paper elaborates a plug-and-play mechanism called BagCAMs to better project a well-trained classifier for the localization task without refining or re-training the baseline structure and can improve the performance of baseline WSOL methods to a great extent.

Prediction of Prospecting Target Based on Selective Transfer Network

In recent years, with the integration and development of artificial intelligence technology and geology, traditional geological prospecting has begun to change to intelligent prospecting. Intelligent

WHU-OHS: A benchmark dataset for large-scale Hersepctral Image classification



Compact Generalized Non-local Network

This extension utilizes the compact representation for multiple kernel functions with Taylor expansion that makes the generalized non-local module in a fast and low-complexity computation flow and implements the generalizednon-local method within channel groups to ease the optimization.

A2-Nets: Double Attention Networks

This work proposes the "double attention block", a novel component that aggregates and propagates informative global features from the entire spatio-temporal space of input images/videos, enabling subsequent convolution layers to access featuresFrom the entire space efficiently.

Non-local Neural Networks

This paper presents non-local operations as a generic family of building blocks for capturing long-range dependencies in computer vision and improves object detection/segmentation and pose estimation on the COCO suite of tasks.

Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks

This work proposes a simple, lightweight solution to the issue of limited context propagation in ConvNets, which propagates context across a group of neurons by aggregating responses over their extent and redistributing the aggregates back through the group.

Compact Global Descriptor for Neural Networks

A generic family of lightweight global descriptors for modeling the interactions between positions across different dimensions that enables subsequent convolutions to access the informative global features with negligible computational complexity and parameters is presented.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

This work addresses the task of semantic image segmentation with Deep Learning and proposes atrous spatial pyramid pooling (ASPP), which is proposed to robustly segment objects at multiple scales, and improves the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models.

Deformable ConvNets V2: More Deformable, Better Results

This work presents a reformulation of Deformable Convolutional Networks that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training, and guides network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features.

Deformable Convolutional Networks

This work introduces two new modules to enhance the transformation modeling capability of CNNs, namely, deformable convolution and deformable RoI pooling, based on the idea of augmenting the spatial sampling locations in the modules with additional offsets and learning the offsets from the target tasks, without additional supervision.

Convolutional neural networks at constrained time cost

  • Kaiming HeJian Sun
  • Computer Science
    2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2015
This paper investigates the accuracy of CNNs under constrained time cost, and presents an architecture that achieves very competitive accuracy in the ImageNet dataset, yet is 20% faster than “AlexNet” [14] (16.0% top-5 error, 10-view test).

CCNet: Criss-Cross Attention for Semantic Segmentation

  • Zilong HuangXinggang Wang Wenyu Liu
  • Computer Science
    2019 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2019
This work proposes a Criss-Cross Network (CCNet) for obtaining contextual information in a more effective and efficient way and achieves the mIoU score of 81.4 and 45.22 on Cityscapes test set and ADE20K validation set, respectively, which are the new state-of-the-art results.