Factorized Bilinear Models for Image Recognition
@article{Li2017FactorizedBM, title={Factorized Bilinear Models for Image Recognition}, author={Yanghao Li and Naiyan Wang and Jiaying Liu and Xiaodi Hou}, journal={2017 IEEE International Conference on Computer Vision (ICCV)}, year={2017}, pages={2098-2106} }
Although Deep Convolutional Neural Networks (CNNs) have liberated their power in various computer vision tasks, the most important components of CNN, convolutional layers and fully connected layers, are still limited to linear transformations. In this paper, we propose a novel Factorized Bilinear (FB) layer to model the pairwise feature interactions by considering the quadratic terms in the transformations. Compared with existing methods that tried to incorporate complex non-linearity…
Figures and Tables from this paper
60 Citations
Learning Deep Bilinear Transformation for Fine-grained Image Representation
- Computer ScienceNeurIPS
- 2019
A deep bilinear transformation (DBT) block, which can be deeply stacked in convolutional neural networks to learn fine-grained image representations, and achieves new state-of-the-art in several fine- grained image recognition benchmarks, including CUB-Bird, Stanford-Car, and FGVC-Aircraft.
Global Second-Order Pooling Convolutional Networks
- Computer Science2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2019
A novel network model introducing GSoP across from lower to higher layers for exploiting holistic image information throughout a network to make full use of the second-order statistics of the holistic image Throughout a network is proposed.
Graph Convolutional Network with Generalized Factorized Bilinear Aggregation
- Computer ScienceArXiv
- 2021
A novel generalization of Factorized Bilinear (FB) layer to model the feature interactions in GCNs by defining a family of summarizing operators applied over the quadratic term and demonstrating that the GFB-GCN is competitive with other methods for text classification.
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain, and develops a method to adaptively select kernel size of 1D convolution, determining coverage of local cross-channel interaction.
Approximated Bilinear Modules for Temporal Modeling
- Computer Science2019 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2019
It is shown how two-layer subnets in CNNs can be converted to temporal bilinear modules by adding an auxiliary-branch, and snippet sampling and shifting inference are introduced to boost sparse-frame video classification performance.
Fine-grained visual classification via multilayer bilinear pooling with object localization
- Computer ScienceVis. Comput.
- 2022
This paper proposes a multilayer bilinear pooling model combined with object localization, which can achieve competitive performance compared with several state-of-the-art methods on fine-grained visual classification tasks.
Temporal Bilinear Networks for Video Action Recognition
- Computer ScienceAAAI
- 2019
This paper proposes a novel Temporal Bilinear (TB) model to capture the temporal pairwise feature interactions between adjacent frames and considers explicit quadratic bilinear transformations in the temporal domain for motion evolution and sequential relation modeling.
Second-Order Response Transform Attention Network for Image Classification
- Computer ScienceIEEE Access
- 2019
This work proposes a novel Second-order Response Transform Attention Network (SoRTA-Net) for classification tasks, which can be flexibly inserted into existing CNNs without any modification of network topology.
Multi-modal Factorized Bilinear Pooling with Co-attention Learning for Visual Question Answering
- Computer Science2017 IEEE International Conference on Computer Vision (ICCV)
- 2017
A Multi-modal Factorized Bilinear (MFB) pooling approach to efficiently and effectively combine multi- modal features, which results in superior performance for VQA compared with other bilinear pooling approaches.
References
SHOWING 1-10 OF 50 REFERENCES
Return of the Devil in the Details: Delving Deep into Convolutional Nets
- Computer ScienceBMVC
- 2014
It is shown that the data augmentation techniques commonly applied to CNN-based methods can also be applied to shallow methods, and result in an analogous performance boost, and it is identified that the dimensionality of the CNN output layer can be reduced significantly without having an adverse effect on performance.
Bilinear CNN Models for Fine-Grained Visual Recognition
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
We propose bilinear models, a recognition architecture that consists of two feature extractors whose outputs are multiplied using outer product at each location of the image and pooled to obtain an…
Deeply-Supervised Nets
- Computer ScienceAISTATS
- 2015
The proposed deeply-supervised nets (DSN) method simultaneously minimizes classification error while making the learning process of hidden layers direct and transparent, and extends techniques from stochastic gradient methods to analyze the algorithm.
One-to-many face recognition with bilinear CNNs
- Computer Science2016 IEEE Winter Conference on Applications of Computer Vision (WACV)
- 2016
This work applies the bilinear CNN model to the challenging new face recognition benchmark, the IARPA Janus Benchmark A (IJB-A), and demonstrates how a standard CNN pre-trained on a large face database, the recently released VGG-Face model, can be converted into a B-CNN without any additional feature training.
ImageNet classification with deep convolutional neural networks
- Computer ScienceCommun. ACM
- 2012
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
On Vectorization of Deep Convolutional Neural Networks for Vision Tasks
- Computer ScienceAAAI
- 2015
This paper studied the vectorization process of key building blocks in deep CNNs, in order to better understand and facilitate parallel implementation, and developed and compared six implementations with various degrees of vectorization.
Going deeper with convolutions
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition…
Network In Network
- Computer ScienceICLR
- 2014
With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
- Computer Science2015 IEEE International Conference on Computer Vision (ICCV)
- 2015
This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2015
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.