Factorized Bilinear Models for Image Recognition

  title={Factorized Bilinear Models for Image Recognition},
  author={Yanghao Li and Naiyan Wang and Jiaying Liu and Xiaodi Hou},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
Although Deep Convolutional Neural Networks (CNNs) have liberated their power in various computer vision tasks, the most important components of CNN, convolutional layers and fully connected layers, are still limited to linear transformations. In this paper, we propose a novel Factorized Bilinear (FB) layer to model the pairwise feature interactions by considering the quadratic terms in the transformations. Compared with existing methods that tried to incorporate complex non-linearity… 
Learning Deep Bilinear Transformation for Fine-grained Image Representation
A deep bilinear transformation (DBT) block, which can be deeply stacked in convolutional neural networks to learn fine-grained image representations, and achieves new state-of-the-art in several fine- grained image recognition benchmarks, including CUB-Bird, Stanford-Car, and FGVC-Aircraft.
Global Second-Order Pooling Convolutional Networks
A novel network model introducing GSoP across from lower to higher layers for exploiting holistic image information throughout a network to make full use of the second-order statistics of the holistic image Throughout a network is proposed.
Graph Convolutional Network with Generalized Factorized Bilinear Aggregation
A novel generalization of Factorized Bilinear (FB) layer to model the feature interactions in GCNs by defining a family of summarizing operators applied over the quadratic term and demonstrating that the GFB-GCN is competitive with other methods for text classification.
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
This paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain, and develops a method to adaptively select kernel size of 1D convolution, determining coverage of local cross-channel interaction.
Approximated Bilinear Modules for Temporal Modeling
It is shown how two-layer subnets in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch, and snippet sampling and shifting inference are introduced to boost sparse-frame video classification performance.
Fine-grained visual classification via multilayer bilinear pooling with object localization
This paper proposes a multilayer bilinear pooling model combined with object localization, which can achieve competitive performance compared with several state-of-the-art methods on fine-grained visual classification tasks.
Temporal Bilinear Networks for Video Action Recognition
This paper proposes a novel Temporal Bilinear (TB) model to capture the temporal pairwise feature interactions between adjacent frames and considers explicit quadratic bilinear transformations in the temporal domain for motion evolution and sequential relation modeling.
Second-Order Response Transform Attention Network for Image Classification
This work proposes a novel Second-order Response Transform Attention Network (SoRTA-Net) for classification tasks, which can be flexibly inserted into existing CNNs without any modification of network topology.
Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering
A Multi-modal Factorized Bilinear (MFB) pooling approach to efficiently and effectively combine multi- modal features, which results in superior performance for VQA compared with other bilinear pooling approaches.


Return of the Devil in the Details: Delving Deep into Convolutional Nets
It is shown that the data augmentation techniques commonly applied to CNN-based methods can also be applied to shallow methods, and result in an analogous performance boost, and it is identified that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance.
Bilinear CNN Models for Fine-Grained Visual Recognition
We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an
Deeply-Supervised Nets
The proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent, and extends techniques from stochastic gradient methods to analyze the algorithm.
One-to-many face recognition with bilinear CNNs
This work applies the bilinear CNN model to the challenging new face recognition benchmark, the IARPA Janus Benchmark A (IJB-A), and demonstrates how a standard CNN pre-trained on a large face database, the recently released VGG-Face model, can be converted into a B-CNN without any additional feature training.
ImageNet classification with deep convolutional neural networks
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
On Vectorization of Deep Convolutional Neural Networks for Vision Tasks
This paper studied the vectorization process of key building blocks in deep CNNs, in order to better understand and facilitate parallel implementation, and developed and compared six implementations with various degrees of vectorization.
Going deeper with convolutions
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition
Network In Network
With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.