What Deep CNNs Benefit From Global Covariance Pooling: An Optimization Perspective
@article{Wang2020WhatDC, title={What Deep CNNs Benefit From Global Covariance Pooling: An Optimization Perspective}, author={Qilong Wang and Li Zhang and Banggu Wu and Dongwei Ren and P. Li and Wangmeng Zuo and Qinghua Hu}, journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2020}, pages={10768-10777} }
Recent works have demonstrated that global covariance pooling (GCP) has the ability to improve performance of deep convolutional neural networks (CNNs) on visual classification task. Despite considerable advance, the reasons on effectiveness of GCP on deep CNNs have not been well studied. In this paper, we make an attempt to understand what deep CNNs benefit from GCP in a viewpoint of optimization. Specifically, we explore the effect of GCP on deep CNNs in terms of the Lipschitzness of…
Figures and Tables from this paper
11 Citations
So-ViT: Mind Visual Tokens for Vision Transformer
- Computer ScienceArXiv
- 2021
This paper proposes a new classification paradigm, where the second-order, cross-covariance pooling of visual tokens is combined with class token for final classification, and develops a light-weight, hierarchical module based on off-the-shelf convolutions for visual token embedding.
Graph attention mechanism with global contextual information for multi-label image recognition
- Computer ScienceJ. Electronic Imaging
- 2021
Compared with classical ML-GCN, the model can better combine the image features and label embedding and outperforms the state-of-the-art methods such as residual multi-layer perceptron, EfficientNet, and vision transformer.
Robustness in Deep Learning for Computer Vision: Mind the gap?
- Computer ScienceArXiv
- 2021
A structural causal model of the data generating process is introduced and non-adversarial robustness is interpreted as pertaining to a model’s behavior on corrupted images which correspond to low-probability samples from the unaltered data distribution, revealing common practices in the current literature correspond to causal concepts.
Fine-Grained Image Analysis with Deep Learning: A Survey
- Computer ScienceIEEE transactions on pattern analysis and machine intelligence
- 2021
A systematic survey of recent advances in deep learning powered FGIA is presented, where it attempts to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas -- fine- grained image recognition and fine-Grained image retrieval.
SoT: Delving Deeper into Classification Head for Transformer
- Computer Science
- 2021
This paper empirically disclose that high-level word tokens contain rich information, which per se are very competent with the classifier and moreover, are complementary to the classification token, and proposes multiheaded global cross-covariance pooling with singular value power normalization, which shares similar philosophy and thus is compatible with the transformer block, better than commonly used pooling methods.
Conditional Temporal Neural Processes with Covariance Loss
- Computer ScienceICML
- 2021
We introduce a novel loss function, Covariance Loss, which is conceptually equivalent to conditional neural processes and has a form of regularization so that is applicable to many kinds of neural…
Deep Semantic-Preserving Reconstruction Hashing for Unsupervised Cross-Modal Retrieval
- Computer ScienceEntropy
- 2020
This work introduces a new spatial pooling network module based on tensor regular-polymorphic decomposition theory to generate rank-1 tensor to capture high-order context semantics, which can assist the backbone network to capture important contextual modal semantic information.
Remote Sensing and Social Sensing Data Fusion for Fine-Resolution Population Mapping With a Multimodel Neural Network
- Environmental ScienceIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
- 2021
A multimodel fusion neural network is proposed, which combines a convolutional neural network and a multilayer perceptron (MLP) model to estimate a fine-resolution population mapping and can identify differences in population density in densely populated areas and some remote population clusters more accurately than the WorldPop population dataset.
Infinite-dimensional feature aggregation via a factorized bilinear model
- Computer SciencePattern Recognit.
- 2022
References
SHOWING 1-10 OF 51 REFERENCES
G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
Experimental results on large scale region classification and fine-grained recognition tasks show that G2DeNet is superior to its counterparts, capable of achieving state-of-the-art performance.
Improved Bilinear Pooling with CNNs
- Computer ScienceBMVC
- 2017
This paper investigates various ways of normalizing second-order statistics of convolutional features to improve their representation power and finds that the matrix square-root normalization offers significant improvements and outperforms alternative schemes such as the matrix logarithm normalization when combined with elementwisesquare-root and l2 normalization.
Understanding Batch Normalization
- Computer ScienceNeurIPS
- 2018
It is shown that BN primarily enables training with larger learning rates, which is the cause for faster convergence and better generalization, and contrasts the results against recent findings in random matrix theory, shedding new light on classical initialization schemes and their consequences.
Kernel Pooling for Convolutional Neural Networks
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
This work demonstrates how to approximate kernels such as Gaussian RBF up to a given order using compact explicit feature maps in a parameter-free manner and proposes a general pooling framework that captures higher order interactions of features in the form of kernels.
Interpreting Deep Visual Representations via Network Dissection
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2019
Network Dissection is described, a method that interprets networks by providing meaningful labels to their individual units that reveals that deep representations are more transparent and interpretable than they would be under a random equivalently powerful basis.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2015
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
MoNet: Moments Embedding Network
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This paper unify bilinear pooling and the global Gaussian embedding layers through the empirical moment matrix and proposes a novel sub-matrix square-root layer, which can be used to normalize the output of the convolution layer directly and mitigate the dimensionality problem with off-the-shelf compact pooling methods.
Deep Residual Learning for Image Recognition
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Is Second-Order Information Helpful for Large-Scale Visual Recognition?
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
A Matrix Power Normalized Covariance (MPNCOV) method that develops forward and backward propagation formulas regarding the nonlinear matrix functions such that MPN-COV can be trained end-to-end and analyzes both qualitatively and quantitatively its advantage over the well-known Log-Euclidean metric.
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This work proposes an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks, which is much faster than EIG or SVD based methods, since it involves only matrix multiplications, suitable for parallel implementation on GPU.