Is Second-Order Information Helpful for Large-Scale Visual Recognition?

  title={Is Second-Order Information Helpful for Large-Scale Visual Recognition?},
  author={P. Li and Jiangtao Xie and Qilong Wang and Wangmeng Zuo},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  • P. Li, Jiangtao Xie, W. Zuo
  • Published 23 March 2017
  • Computer Science
  • 2017 IEEE International Conference on Computer Vision (ICCV)
By stacking layers of convolution and nonlinearity, convolutional networks (ConvNets) effectively learn from lowlevel to high-level features and discriminative representations. Since the end goal of large-scale recognition is to delineate complex boundaries of thousands of classes, adequate exploration of feature distributions is important for realizing full potentials of ConvNets. However, state-of-theart works concentrate only on deeper or wider architecture design, while rarely exploring… 

Figures and Tables from this paper

Global Second-Order Pooling Convolutional Networks
A novel network model introducing GSoP across from lower to higher layers for exploiting holistic image information throughout a network to make full use of the second-order statistics of the holistic image Throughout a network is proposed.
Detachable Second-Order Pooling: Toward High-Performance First-Order Networks.
This work presents a novel architecture, namely a detachable second-order pooling network, to leverage the advantage of second- order pooling by first-order networks while keeping the model complexity unchanged during inference.
Second-order Attention Guided Convolutional Activations for Visual Recognition
This work makes an attempt to combine deep second-order statistics with attention mechanisms in ConvNets, and further proposes a novel Second-order Attention Guided Network (SoAG-Net) for visual recognition that outperforms its counterparts and achieves competitive performance with state-of theart models under the same backbone.
Multi-Order Feature Statistical Model for Fine-Grained Visual Categorization
A multi-order feature statistical method (MOFS), which learns fine-grained features characterizing multiple orders by deploying two sub-modules on the top of existing backbone networks, which simultaneously captures multi-level of discriminative patters including local, global and co-related patters.
Second-Order Attention Network for Single Image Super-Resolution
Experimental results demonstrate the superiority of the SAN network over state-of-the-art SISR methods in terms of both quantitative metrics and visual quality.
MoNet: Moments Embedding Network
This paper unify bilinear pooling and the global Gaussian embedding layers through the empirical moment matrix and proposes a novel sub-matrix square-root layer, which can be used to normalize the output of the convolution layer directly and mitigate the dimensionality problem with off-the-shelf compact pooling methods.
PCANet-II: When PCANet Meets the Second Order Pooling
Compared with the histogram-based output, the second order pooling not only provides more discriminative information by preserving both the magnitude and sign of convolutional responses, but also dramatically reduces the size of output features.
On the Eigenvalues of Global Covariance Pooling for Fine-grained Visual Recognition
A network branch dedicated to magnifying the importance of small eigenvalues is proposed that achieves state-of-the-art performances of GCP methods on three fine-grained benchmarks and is also competitive against other FGVC approaches on larger datasets.
Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks
A sparsity-constrained gating mechanism is introduced and a novel parametric SOP is proposed as component of mixture model, which can flexibly accommodate a large number of personalized SOP candidates in an efficient way, leading to richer representations of deep CNNs.


Very Deep Convolutional Networks for Large-Scale Image Recognition
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Bilinear CNN Models for Fine-Grained Visual Recognition
We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an
Training Deep Networks with Structured Layers by Matrix Backpropagation
Deep neural network architectures have recently produced excellent results in a variety of areas in artificial intelligence and visual recognition, well surpassing traditional shallow architectures
Matrix Backpropagation for Deep Networks with Structured Layers
A sound mathematical apparatus to formally integrate global structured computation into deep computation architectures and demonstrates that deep networks relying on second-order pooling and normalized cuts layers, trained end-to-end using matrix backpropagation, outperform counterparts that do not take advantage of such global layers.
Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition
This paper introduces Higher-order Kernel (HOK) descriptors generated from the late fusion of CNN classifier scores from all the frames in a sequence, using the idea of kernel linearization.
Return of the Devil in the Details: Delving Deep into Convolutional Nets
It is shown that the data augmentation techniques commonly applied to CNN-based methods can also be applied to shallow methods, and result in an analogous performance boost, and it is identified that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance.
Deep Residual Learning for Image Recognition
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Going deeper with convolutions
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition
Semantic Segmentation with Second-Order Pooling
This paper introduces multiplicative second-order analogues of average and max-pooling that together with appropriate non-linearities lead to state-of-the-art performance on free-form region recognition, without any type of feature coding.
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.