Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
@article{Song2021WhyAM, title={Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?}, author={Yue Song and N. Sebe and Wei Wang}, journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)}, year={2021}, pages={1095-1103} }
Global Covariance Pooling (GCP) aims at exploiting the second-order statistics of the convolutional feature. Its effectiveness has been demonstrated in boosting the classification performance of Convolutional Neural Networks (CNNs). Singular Value Decomposition (SVD) is used in GCP to compute the matrix square root. However, the approximate matrix square root calculated using Newton-Schulz iteration [14] outperforms the accurate one computed via SVD [15]. We empirically analyze the reason…
10 Citations
Fast Differentiable Matrix Square Root and Inverse Square Root
- Computer ScienceIEEE transactions on pattern analysis and machine intelligence
- 2022
Two more efficient variants to compute the differentiable matrix square root and the inverse square root are proposed and validated in several real-world applications.
On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition
- Computer ScienceIEEE transactions on pattern analysis and machine intelligence
- 2022
A network branch dedicated to magnifying the importance of small eigenvalues is proposed that achieves state-of-the-art performances of GCP methods on three fine-grained benchmarks and is also competitive against other FGVC approaches on larger datasets.
Fast Differentiable Matrix Square Root
- Computer ScienceICLR
- 2022
Two more efficient variants to compute the differentiable matrix square root are proposed to use Matrix Taylor Polynomial (MTP) and Matrix Padé Approximants (MPA) and yield considerable speed-up compared with the SVD or the Newton-Schulz iteration.
Orthogonal SVD Covariance Conditioning and Latent Disentanglement
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2022
This paper systematically study how to improve the covariance conditioning by enforcing orthogonality to the Pre-SVD layer by proposing the Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR).
Batch-efficient EigenDecomposition for Small and Medium Matrices
- Computer ScienceECCV
- 2022
This paper proposes a QR-based EigenDecomposition method that performs the ED entirely by batched matrix/vector multiplication, which processes all the matrices simultaneously and thus fully utilizes the power of GPUs.
Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality
- Computer ScienceECCV
- 2022
This paper systematically study how to improve the covariance conditioning by enforcing orthogonality to the Pre-SVD layer by proposing the Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR).
Grouping-matrix based Graph Pooling with Adaptive Number of Clusters
- Computer ScienceArXiv
- 2022
This work proposes GMPOOL, a novel differentiable graph pooling architecture that automatically determines the appropriate number of clusters based on the input data and outperforms conventional methods on molecular property prediction tasks.
Convolutional Fine-Grained Classification With Self-Supervised Target Relation Regularization
- Computer ScienceIEEE Transactions on Image Processing
- 2022
Inspired by recent success of the mixup style data augmentation, randomness is introduced into soft construction of dynamic target relation graphs to further explore relation diversity of target classes.
A new stable and avoiding inversion iteration for computing matrix square root
- Computer ScienceArXiv
- 2022
The high computational efficiency and accuracy of the proposed method are demonstrated by computing the principal square roots of different matrices to reveal its applicability over the existing methods.
WISEFUSE: Workload Characterization and DAG Transformation for Serverless Workflows
- Computer ScienceSIGMETRICS
- 2022
This work proposes WISEFUSE, an automated approach to generate an optimized execution plan for serverless DAGs for a user-specified latency objective or budget and implements it experimentally, showing significant improvements in E2E latency and cost.
References
SHOWING 1-10 OF 31 REFERENCES
Backpropagation-Friendly Eigendecomposition
- Computer ScienceNeurIPS
- 2019
This paper introduces a numerically stable and differentiable approach to leveraging eigenvectors in deep networks, which can handle large matrices without requiring to split them and introduces PCA denoising, which is introduced as a new normalization strategy for deep networks.
Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
This work proposes an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks, which is much faster than EIG or SVD based methods, since it involves only matrix multiplications, suitable for parallel implementation on GPU.
Introductory Lectures on Convex Optimization - A Basic Course
- Computer ScienceApplied Optimization
- 2004
It was in the middle of the 1980s, when the seminal paper by Kar markar opened a new epoch in nonlinear optimization, and it became more and more common that the new methods were provided with a complexity analysis, which was considered a better justification of their efficiency than computational experiments.
Improved Bilinear Pooling with CNNs
- Computer ScienceBMVC
- 2017
This paper investigates various ways of normalizing second-order statistics of convolutional features to improve their representation power and finds that the matrix square-root normalization offers significant improvements and outperforms alternative schemes such as the matrix logarithm normalization when combined with elementwisesquare-root and l2 normalization.
Matrix Backpropagation for Deep Networks with Structured Layers
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
A sound mathematical apparatus to formally integrate global structured computation into deep computation architectures and demonstrates that deep networks relying on second-order pooling and normalized cuts layers, trained end-to-end using matrix backpropagation, outperform counterparts that do not take advantage of such global layers.
Deep Residual Learning for Image Recognition
- Computer Science2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2016
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Bilinear CNN Models for Fine-Grained Visual Recognition
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an…
ImageNet: A large-scale hierarchical image database
- Computer Science2009 IEEE Conference on Computer Vision and Pattern Recognition
- 2009
A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
An Investigation Into the Stochasticity of Batch Whitening
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This paper quantitatively investigates the stochasticity of different whitening transformations and shows that it correlates well with the optimization behaviors during training, and provides a framework for designing and comparing BW algorithms in different scenarios.
What Deep CNNs Benefit From Global Covariance Pooling: An Optimization Perspective
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This paper explores the effect of GCP on deep CNNs in terms of the Lipschitzness of optimization loss and the predictiveness of gradients, and shows that GCP can make the optimization landscape more smooth and the gradients more predictive.