Incorporating intra-class variance to fine-grained visual recognition

  title={Incorporating intra-class variance to fine-grained visual recognition},
  author={Yan Em and Feng Gao and Yihang Lou and Shiqi Wang and Tiejun Huang and Ling-yu Duan},
  journal={2017 IEEE International Conference on Multimedia and Expo (ICME)},
  • Yan EmFeng Gao Ling-yu Duan
  • Published 1 March 2017
  • Computer Science
  • 2017 IEEE International Conference on Multimedia and Expo (ICME)
Fine-grained visual recognition aims to capture discriminative characteristics amongst visually similar categories. The state-of-the-art research work has significantly improved the fine-grained recognition performance by deep metric learning using triplet network. However, the impact of intra-category variance on the performance of recognition and robust feature representation has not been well studied. In this paper, we propose to leverage intra-class variance in metric learning of triplet… 

Figures and Tables from this paper

Ranking-based triplet loss function with intra-class mean and variance for fine-grained classification tasks

A modified distance criterion described in the current work leverages the intra-category variance in metric learning of a triplet network by learning a local sample structure to efficiently learn similarity metric from top ranked images.

Are These Birds Similar: Learning Branched Networks for Fine-grained Representations

This paper leverages on natural language descriptions and proposes a strategy for learning the joint representation of natural language description and images using a two-branch network with multiple layers to improve the fine-grained classification task.

Adaptive Bilinear Pooling for Fine-grained Representation Learning

This work proposes a generalized feature interaction method, named Adaptive Bilinear Pooling (ABP), which can adaptively infer a suitable pooling strategy for a given sample based on image content, and demonstrates the effectiveness of the proposed method on three widely used benchmarks.

Inter-intra Variant Dual Representations for Self-supervised Video Recognition

This paper proposes to learn dual representations for each clip which encode intra-variance through a shuffle-rank pretext task and encode inter-variances through a temporal coherent contrastive loss, and shows that this method plays an essential role in balancing inter and intra variances.

MIC: Mining Interclass Characteristics for Improved Metric Learning

This work proposes a novel surrogate task to learn visual characteristics shared across classes with a separate encoder, trained jointly with the encoder for class information by reducing their mutual information.

Global Topology Constraint Network for Fine-Grained Vehicle Recognition

Aglobal topology constraint network for fine-grained vehicle recognition is proposed, which adopts the constraint of global topology relationship to depict the interaction between parts and integrates it into CNN in an efficient way.

Improving Deep Metric Learning by Divide and Conquer

This approach significantly improves upon the state-of-the-art on image retrieval, clustering, and re-identification tasks evaluated using CUB200-2011, CARS196, Stanford Online Products, In-shop Clothes, and PKU VehicleID datasets.

Deep Fourier Ranking Quantization for Semi-Supervised Image Retrieval

A new Fourier perspective is introduced to alleviate this issue by exploring the semantic relations between unlabeled instances in a self-supervised manner and outperforms existing state-of-the-art methods by averaged 3.95% improvement on four datasets.

How Incomplete is Contrastive Learning? An Inter-intra Variant Dual Representation Method for Self-supervised Video Recognition

This paper proposes to learn dual representations for each clip which encode intra-variance through a shuffle-rank pretext task and encode inter-variances through a temporal coherent contrastive loss, and shows that this method plays an essential role in balancing inter and intra variances.



Embedding Label Structures for Fine-Grained Feature Representation

The proposed multitask learning framework significantly outperforms previous fine-grained feature representations for image retrieval at different levels of relevance and to model the multi-level relevance, label structures such as hierarchy or shared attributes are seamlessly embedded into the framework by generalizing the triplet loss.

The application of two-level attention models in deep convolutional neural network for fine-grained image classification

This paper proposes to apply visual attention to fine-grained classification task using deep neural network and achieves the best accuracy under the weakest supervision condition, and is competitive against other methods that rely on additional annotations.

Fine-grained visual categorization via multi-stage metric learning

A multi-stage metric learning framework that divides the large-scale high dimensional learning problem to a series of simple subproblems, achieving O(d) computational complexity is proposed.

Part-Based R-CNNs for Fine-Grained Category Detection

This work proposes a model for fine-grained categorization that overcomes limitations by leveraging deep convolutional features computed on bottom-up region proposals, and learns whole-object and part detectors, enforces learned geometric constraints between them, and predicts a fine- grained category from a pose-normalized representation.

Learning Fine-Grained Image Similarity with Deep Ranking

A deep ranking model that employs deep learning techniques to learn similarity metric directly from images has higher learning capability than models based on hand-crafted features and deep classification models.

Deep feature learning with relative distance comparison for person re-identification

3D Object Representations for Fine-Grained Categorization

This paper lifts two state-of-the-art 2D object representations to 3D, on the level of both local feature appearance and location, and shows their efficacy for estimating 3D geometry from images via ultra-wide baseline matching and 3D reconstruction.

Fine-Grained Image Classification by Exploring Bipartite-Graph Labels

  • Feng ZhouYuanqing Lin
  • Computer Science
    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2016
This paper shows how to model BGL in an overall convolutional neural networks and the resulting system can be optimized through back-propagation, and shows that it is computationally efficient in inference thanks to the bipartite structure.

An improved deep learning architecture for person re-identification

This work presents a deep convolutional architecture with layers specially designed to address the problem of re-identification, and significantly outperforms the state of the art on both a large data set and a medium-sized data set, and is resistant to over-fitting.

Novel Dataset for Fine-Grained Image Categorization : Stanford Dogs

A 120 class Stanford Dogs dataset, a challenging and large-scale dataset aimed at fine-grained image categorization, is introduced, which includes over 22,000 annotated images of dogs belonging to 120 species.