Global Second-Order Pooling Convolutional Networks

  title={Global Second-Order Pooling Convolutional Networks},
  author={Zilin Gao and Jiangtao Xie and Qilong Wang and P. Li},
  journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  • Zilin GaoJiangtao Xie P. Li
  • Published 29 November 2018
  • Computer Science
  • 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Deep Convolutional Networks (ConvNets) are fundamental to, besides large-scale visual recognition, a lot of vision tasks. As the primary goal of the ConvNets is to characterize complex boundaries of thousands of classes in a high-dimensional space, it is critical to learn higher-order representations for enhancing non-linear modeling capability. Recently, Global Second-order Pooling (GSoP), plugged at the end of networks, has attracted increasing attentions, achieving much better performance… 

Figures and Tables from this paper

Rotate to Attend: Convolutional Triplet Attention Module

The method is simple as well as efficient and can be easily plugged into classic backbone networks as an add-on module and supports the intuition on the importance of capturing dependencies across dimensions when computing attention weights.

Learning Deep Bilinear Transformation for Fine-grained Image Representation

A deep bilinear transformation (DBT) block, which can be deeply stacked in convolutional neural networks to learn fine-grained image representations, and achieves new state-of-the-art in several fine- grained image recognition benchmarks, including CUB-Bird, Stanford-Car, and FGVC-Aircraft.

SDA-xNet: Selective Depth Attention Networks for Adaptive Multi-scale Feature Representation

This work introduces a new attention dimension, i.e., depth, in addition to existing attention dimensions such as channel, spatial, and branch, and presents a novel selective depth attention network to symmetrically handle multi-scale objects in various vision tasks.

BA-Net: Bridge Attention for Deep Convolutional Neural Networks

The Comprehensive evaluation demonstrates that the proposed approach achieves state-of-the-art SOTA performance compared with the existing meth- ods in accuracy and speed, which shows that Bridge Attention provides a new perspective on the design of neural network architectures with great potential in improving performance.

Gaussian Context Transformer

This paper proposes a simple yet extremely efficient channel attention block, called Gaussian Context Transformer (GCT), which achieves contextual feature excitation using a Gaussian function that satisfies the presupposed relation-ship.

Attention-Based Second-Order Pooling Network for Hyperspectral Image Classification

Experimental results demonstrate that A-SPN outperforms other traditional and state-of-the-art DL-based HSI classification methods in terms of generalization performance with limited training samples, classification accuracy, convergence rate, and computational complexity.

MBMR-Net: multi-branches multi-resolution cross-projection network for single image super-resolution

A novel attention unit is introduced that integrates second-order channel attention with spatial attention to better fuse information from multiple resolutions in a deep network called the multi-branches multi-resolution cross-projection network (MBMR-Net).

Exemplar Normalization for Learning Deep Representation

This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network.

Efficient-Receptive Field Block with Group Spatial Attention Mechanism for Object Detection

A novel multibranch module called Efficient-Receptive Field Block (E-RFB), in which multiple levels of features are combined for network optimization, which achieves superior performance compared with state-of-the-art methods based on similar framework.



Is Second-Order Information Helpful for Large-Scale Visual Recognition?

A Matrix Power Normalized Covariance (MPNCOV) method that develops forward and backward propagation formulas regarding the nonlinear matrix functions such that MPN-COV can be trained end-to-end and analyzes both qualitatively and quantitatively its advantage over the well-known Log-Euclidean metric.

Kernel Pooling for Convolutional Neural Networks

This work demonstrates how to approximate kernels such as Gaussian RBF up to a given order using compact explicit feature maps in a parameter-free manner and proposes a general pooling framework that captures higher order interactions of features in the form of kernels.

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition

Experimental results on large scale region classification and fine-grained recognition tasks show that G2DeNet is superior to its counterparts, capable of achieving state-of-the-art performance.

FASON: First and Second Order Information Fusion Network for Texture Recognition

This work proposes an effective fusion architecture - FASON that combines second order information flow and first order Information flow within different convolutional layers and achieves improvements over state-of-the-art methods on several benchmark datasets.

Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks

This work proposes a simple, lightweight solution to the issue of limited context propagation in ConvNets, which propagates context across a group of neurons by aggregating responses over their extent and redistributing the aggregates back through the group.

Network In Network

With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.

Non-linear Convolution Filters for CNN-Based Learning

This work addresses the issue of developing a convolution method in the context of a computational model of the visual cortex, exploring quadratic forms through the Volterra kernels, and shows that a network which combines linear and non-linear filters in its convolutional layers, can outperform networks that use standard linear filters with the same architecture.

Compact Bilinear Pooling

Two compact bilinear representations are proposed with the same discriminative power as the full bil inear representation but with only a few thousand dimensions allowing back-propagation of classification errors enabling an end-to-end optimization of the visual recognition system.

Bilinear CNN Models for Fine-Grained Visual Recognition

We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an