Multi-Objective Matrix Normalization for Fine-Grained Visual Recognition

@article{Min2020MultiObjectiveMN,
  title={Multi-Objective Matrix Normalization for Fine-Grained Visual Recognition},
  author={Shaobo Min and Hantao Yao and Hongtao Xie and Zhengjun Zha and Yongdong Zhang},
  journal={IEEE Transactions on Image Processing},
  year={2020},
  volume={29},
  pages={4996-5009}
}
Bilinear pooling achieves great success in fine-grained visual recognition (FGVC). Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features, but some problems, e.g., redundant information and over-fitting, remain to be resolved. In this paper, we propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity… Expand
Fine-Grained Image Analysis with Deep Learning: A Survey
  • Xiu-Shen Wei, Yi-Zhe Song, +5 authors Serge J. Belongie
  • Computer Science, Medicine
  • IEEE transactions on pattern analysis and machine intelligence
  • 2021
TLDR
A systematic survey of recent advances in deep learning powered FGIA is presented, where it attempts to re-define and broaden the field of FGIA by consolidating two fundamental fine-grained research areas -- fine- grained image recognition and fine-Grained image retrieval. Expand
Fine-Grained Visual Computing Based on Deep Learning
TLDR
The constructed fine- grained image categorization model has higher accuracy in image recognition categorization, shorter training time, and significantly better performance in similar feature effects, which provides an experimental reference for the visual computing of fine-grained images in the future. Expand
Multilayer feature fusion with parallel convolutional block for fine-grained image classification
  • Lei Wang, Kai He, Xu Feng, Xitao Ma
  • Computer Science
  • Applied Intelligence
  • 2021
TLDR
A multilayer feature fusion network with parallel convolutional block (PCB) mechanism to solve the problem of fine-grained image classification, which has more effective residual connection ability in extracting the region of interest (ROI) and the parallel convolutions with different sizes of kernels. Expand
Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation
TLDR
A novel Re-Aggregation based framework, which uses feature matching to efficiently find the target and capture the temporal dependencies from multiple frames to guide the segmentation ofWeakly-supervised video object segmentation (WVOS). Expand
Frequency-aware Discriminative Feature Learning Supervised by Single-Center Loss for Face Forgery Detection
TLDR
A novel single-center loss (SCL) is designed that only compresses intra-class variations of natural faces while boosting inter-class differences in the embedding space and can learn more discriminative features with less optimization difficulty. Expand
Large Scale Visual Food Recognition
TLDR
This paper introduces Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images, and proposes a deep progressive region enhancement network for food recognition, which mainly consists of two components, namely progressive local feature learning and region feature enhancement. Expand
A Mutually Attentive Co-Training Framework for Semi-Supervised Recognition
TLDR
A novel Mutually Attentive Co-training Framework (MACF) is proposed that can effectively alleviate the negative impacts of incorrect labels on model retraining by exploring deep model disagreements and improving the pseudo labels by aggregating the predictions from multi-models and data transformations. Expand
Bag of Tricks for Building an Accurate and Slim Object Detector for Embedded Applications
TLDR
A bag of tricks that improve the detection performance for a specified on-road application, under the premise of ensuring that it does not increase the computational cost of YOLOv5s are explored. Expand
Hierarchical Granularity Transfer Learning
TLDR
A novel Bi-granularity Semantic Preserving Network (BigSPN) is proposed to bridge the granularity gap for robust knowledge transfer and Experiments on three benchmarks with hierarchical granularities show that BigSPN is an effective framework for Hierarchical Granularity Transfer Learning. Expand
Mask or Non-Mask? Robust Face Mask Detector via Triplet-Consistency Representation Learning
TLDR
A face mask detection framework that uses the context attention module to enable the effective attention of the feed-forward convolution neural network by adapting their attention maps feature refinement and an anchor-free detector with Triplet-Consistency Representation Learning by integrating the consistency loss and the triplet loss. Expand
...
1
2
...

References

SHOWING 1-10 OF 80 REFERENCES
Low-Rank Bilinear Pooling for Fine-Grained Classification
TLDR
This work proposes a classifier co-decomposition that factorizes the collection of bilinear classifiers into a common factor and compact per-class terms and achieves state-of-the-art performance on several public datasets for fine-grained classification trained with only category labels. Expand
Grassmann Pooling as Compact Homogeneous Bilinear Pooling for Fine-Grained Visual Classification
TLDR
An alternative pooling method which transforms the CNN feature matrix to an orthonormal matrix consists of its principal singular vectors, which enables a very compact feature and classifier representation on a variety of fine-grained image classification datasets. Expand
Improved Bilinear Pooling with CNNs
TLDR
This paper investigates various ways of normalizing second-order statistics of convolutional features to improve their representation power and finds that the matrix square-root normalization offers significant improvements and outperforms alternative schemes such as the matrix logarithm normalization when combined with elementwisesquare-root and l2 normalization. Expand
Hierarchical Bilinear Pooling for Fine-Grained Visual Recognition
TLDR
A cross-layer bilinear pooling approach is proposed to capture the inter-layer part feature relations, which results in superior performance compared with other bilinears pooling based approaches. Expand
Adaptive Bilinear Pooling for Fine-grained Representation Learning
TLDR
This work proposes a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content, and demonstrates the effectiveness of the proposed method on three widely used benchmarks. Expand
Compact Bilinear Pooling
  • Yang Gao, Oscar Beijbom, Ning Zhang, Trevor Darrell
  • Computer Science
  • 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
TLDR
Two compact bilinear representations are proposed with the same discriminative power as the full bil inear representation but with only a few thousand dimensions allowing back-propagation of classification errors enabling an end-to-end optimization of the visual recognition system. Expand
Is Second-Order Information Helpful for Large-Scale Visual Recognition?
TLDR
A Matrix Power Normalized Covariance (MPNCOV) method that develops forward and backward propagation formulas regarding the nonlinear matrix functions such that MPN-COV can be trained end-to-end and analyzes both qualitatively and quantitatively its advantage over the well-known Log-Euclidean metric. Expand
MoNet: Moments Embedding Network
TLDR
This paper unify bilinear pooling and the global Gaussian embedding layers through the empirical moment matrix and proposes a novel sub-matrix square-root layer, which can be used to normalize the output of the convolution layer directly and mitigate the dimensionality problem with off-the-shelf compact pooling methods. Expand
Generalized Orderless Pooling Performs Implicit Salient Matching
TLDR
This paper generalizes average and bilinear pooling to “α-pooling”, allowing for learning the pooling strategy during training, and presents a novel way to visualize decisions made by these approaches. Expand
Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition
TLDR
A novel attention-based convolutional neural network (CNN) which regulates multiple object parts among different input images, which can be easily trained end-to-end, and is highly efficient which requires only one training stage. Expand
...
1
2
3
4
5
...