What Deep CNNs Benefit From Global Covariance Pooling: An Optimization Perspective

@article{Wang2020WhatDC,
  title={What Deep CNNs Benefit From Global Covariance Pooling: An Optimization Perspective},
  author={Qilong Wang and Li Zhang and Banggu Wu and Dongwei Ren and P. Li and Wangmeng Zuo and Qinghua Hu},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={10768-10777}
}
  • Qilong Wang, Li Zhang, Q. Hu
  • Published 25 March 2020
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Recent works have demonstrated that global covariance pooling (GCP) has the ability to improve performance of deep convolutional neural networks (CNNs) on visual classification task. Despite considerable advance, the reasons on effectiveness of GCP on deep CNNs have not been well studied. In this paper, we make an attempt to understand what deep CNNs benefit from GCP in a viewpoint of optimization. Specifically, we explore the effect of GCP on deep CNNs in terms of the Lipschitzness of… 
So-ViT: Mind Visual Tokens for Vision Transformer
TLDR
This paper proposes a new classification paradigm, where the second-order, cross-covariance pooling of visual tokens is combined with class token for final classification, and develops a light-weight, hierarchical module based on off-the-shelf convolutions for visual token embedding.
Graph attention mechanism with global contextual information for multi-label image recognition
TLDR
Compared with classical ML-GCN, the model can better combine the image features and label embedding and outperforms the state-of-the-art methods such as residual multi-layer perceptron, EfficientNet, and vision transformer.
Robustness in Deep Learning for Computer Vision: Mind the gap?
TLDR
A structural causal model of the data generating process is introduced and non-adversarial robustness is interpreted as pertaining to a model’s behavior on corrupted images which correspond to low-probability samples from the unaltered data distribution, revealing common practices in the current literature correspond to causal concepts.
Fine-Grained Image Analysis with Deep Learning: A Survey
TLDR
A systematic survey of recent advances in deep learning powered FGIA is presented, where it attempts to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas -- fine- grained image recognition and fine-Grained image retrieval.
SoT: Delving Deeper into Classification Head for Transformer
TLDR
This paper empirically disclose that high-level word tokens contain rich information, which per se are very competent with the classifier and moreover, are complementary to the classification token, and proposes multiheaded global cross-covariance pooling with singular value power normalization, which shares similar philosophy and thus is compatible with the transformer block, better than commonly used pooling methods.
Conditional Temporal Neural Processes with Covariance Loss
We introduce a novel loss function, Covariance Loss, which is conceptually equivalent to conditional neural processes and has a form of regularization so that is applicable to many kinds of neural
Deep Semantic-Preserving Reconstruction Hashing for Unsupervised Cross-Modal Retrieval
TLDR
This work introduces a new spatial pooling network module based on tensor regular-polymorphic decomposition theory to generate rank-1 tensor to capture high-order context semantics, which can assist the backbone network to capture important contextual modal semantic information.
Remote Sensing and Social Sensing Data Fusion for Fine-Resolution Population Mapping With a Multimodel Neural Network
TLDR
A multimodel fusion neural network is proposed, which combines a convolutional neural network and a multilayer perceptron (MLP) model to estimate a fine-resolution population mapping and can identify differences in population density in densely populated areas and some remote population clusters more accurately than the WorldPop population dataset.
...
1
2
...

References

SHOWING 1-10 OF 51 REFERENCES
G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition
TLDR
Experimental results on large scale region classification and fine-grained recognition tasks show that G2DeNet is superior to its counterparts, capable of achieving state-of-the-art performance.
Improved Bilinear Pooling with CNNs
TLDR
This paper investigates various ways of normalizing second-order statistics of convolutional features to improve their representation power and finds that the matrix square-root normalization offers significant improvements and outperforms alternative schemes such as the matrix logarithm normalization when combined with elementwisesquare-root and l2 normalization.
Understanding Batch Normalization
TLDR
It is shown that BN primarily enables training with larger learning rates, which is the cause for faster convergence and better generalization, and contrasts the results against recent findings in random matrix theory, shedding new light on classical initialization schemes and their consequences.
Kernel Pooling for Convolutional Neural Networks
TLDR
This work demonstrates how to approximate kernels such as Gaussian RBF up to a given order using compact explicit feature maps in a parameter-free manner and proposes a general pooling framework that captures higher order interactions of features in the form of kernels.
Interpreting Deep Visual Representations via Network Dissection
TLDR
Network Dissection is described, a method that interprets networks by providing meaningful labels to their individual units that reveals that deep representations are more transparent and interpretable than they would be under a random equivalently powerful basis.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TLDR
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
MoNet: Moments Embedding Network
TLDR
This paper unify bilinear pooling and the global Gaussian embedding layers through the empirical moment matrix and proposes a novel sub-matrix square-root layer, which can be used to normalize the output of the convolution layer directly and mitigate the dimensionality problem with off-the-shelf compact pooling methods.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Is Second-Order Information Helpful for Large-Scale Visual Recognition?
TLDR
A Matrix Power Normalized Covariance (MPNCOV) method that develops forward and backward propagation formulas regarding the nonlinear matrix functions such that MPN-COV can be trained end-to-end and analyzes both qualitatively and quantitatively its advantage over the well-known Log-Euclidean metric.
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
TLDR
This work proposes an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks, which is much faster than EIG or SVD based methods, since it involves only matrix multiplications, suitable for parallel implementation on GPU.
...
1
2
3
4
5
...