A Compositional Feature Embedding and Similarity Metric for Ultra-Fine-Grained Visual Categorization

  title={A Compositional Feature Embedding and Similarity Metric for Ultra-Fine-Grained Visual Categorization},
  author={Yajie Sun and Miaohua Zhang and Xiaohan Yu and Yi Liao and Yongsheng Gao},
  journal={2021 Digital Image Computing: Techniques and Applications (DICTA)},
Fine-grained visual categorization (FGVC), which aims at classifying objects with small inter-class variances, has been significantly advanced in recent years. However, ultra-fine-grained visual categorization (ultra-FGVC), which targets at identifying subclasses with extremely similar patterns, has not received much attention. In ultra-FGVC datasets, the samples per category are always scarce as the granularity moves down, which will lead to overfitting problems. Moreover, the difference among… 


Destruction and Construction Learning for Fine-Grained Image Recognition
A novel "Destruction and Construction Learning" (DCL) method to enhance the difficulty of fine-grained recognition and exercise the classification model to acquire expert knowledge.
MaskCOV: A random mask covariance network for ultra-fine-grained visual categorization
Improved Bilinear Pooling with CNNs
This paper investigates various ways of normalizing second-order statistics of convolutional features to improve their representation power and finds that the matrix square-root normalization offers significant improvements and outperforms alternative schemes such as the matrix logarithm normalization when combined with elementwisesquare-root and l2 normalization.
Benchmark Platform for Ultra-Fine-Grained Visual Categorization Beyond Human Performance
The proposed UFG image dataset and evaluation protocols is intended to serve as a benchmark platform that can advance research of visual classification from approaching human performance to beyond human ability, via facilitating benchmark data of artificial intelligence (AI) not to be limited by the labels of human intelligence (HI).
Learning discriminative region representation for person retrieval
Learning deep part-aware embedding for person retrieval
Patchy Image Structure Classification Using Multi-Orientation Region Transform
Experimental results demonstrate the effectiveness and superiority of the proposed MORT over the state-of-the-art methods in classifying patchy image structures and can be extended to combine with the deep convolutional neural network techniques, for further enhancement of classification accuracy.
Contour Covariance: A Fast Descriptor for Classification
The proposed descriptor, termed contour covariance (CC), characterizes covariance features driven by a moving point on the shape contour at multiple scales to effectively and efficiently characterize the local image statistics.
MobileFAN: Transferring Deep Hidden Representation for Face Alignment
Multiscale Contour Steered Region Integral and Its Application for Cultivar Classification
A novel multiscale region transform (MReT) is proposed to perform region integral over different contour-steered strips at all possible scales to effectively integrate patch features, and thus enables a better description of the shape image in a coarse-to-fine manner.