Multi-label Image Recognition by Recurrently Discovering Attentional Regions

@article{Wang2017MultilabelIR,
  title={Multi-label Image Recognition by Recurrently Discovering Attentional Regions},
  author={Zhouxia Wang and Tianshui Chen and Guanbin Li and Ruijia Xu and Liang Lin},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={464-472}
}
This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding. Current solutions for this task usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation and sub-optimal performance. In this work, we achieve the interpretable and contextualized multi-label image classification by developing a recurrent memorized-attention module. This… 

Figures and Tables from this paper

Multi-label image classification with recurrently learning semantic dependencies
TLDR
This paper proposes a novel multi-label image classification framework which is an improvement to the CNN–RNN design pattern and demonstrates that the model can effectively exploit the correlation between tags to improve the classification performance as well as better recognize the small targets.
Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition
TLDR
A simple but efficient two-stream framework to recognize multi-category objects from global image to local regions, similar to how human beings perceive objects, which aims to make the number of attentional regions as small as possible and keep the diversity of these regions as high as possible.
Semantic Representation and Dependency Learning for Multi-Label Image Recognition
TLDR
A novel and e-ective semantic representation and dependency learning (SRDL) framework to learn category-specific semantic representation for each category and capture semantic dependency among all categories.
Multi-Label Image Recognition with Multi-Class Attentional Regions
TLDR
This paper proposes a simple but efficient two-stream framework to recognize multi-category objects from global image to local regions, similar to how human beings perceive objects, to bridge the gap between global and local streams.
Diverse Instance Discovery: Vision-Transformer for Instance-Aware Multi-Label Image Recognition
TLDR
The goal is to leverage ViT’s patch tokens and self-attention mechanism to mine rich instances in multi-label images, named diverse instance discovery (DiD), and propose a semantic category-aware and a spatial relationship-aware module, respectively.
Attention-Based Dual-Branch Cascade Network for Multi-Label Image Recognition
  • Xingyu Li
  • Computer Science
    Journal of Physics: Conference Series
  • 2022
TLDR
A unified deep neural network named Attention-Based Dual-Branch Cascade Network (ADC Net), which contains Main Branch and Auxiliary Branch, which cascades to predict labels, and obtains the final result by element-wise adding.
Residual Attention: A Simple but Effective Method for Multi-Label Recognition
  • Ke Zhu, Jianxin Wu
  • Computer Science
    2021 IEEE/CVF International Conference on Computer Vision (ICCV)
  • 2021
TLDR
This work proposes an embarrassingly simple module, named class-specific residual attention (CSRA), which generates class- specific features for every category by proposing a simple spatial attention score, and then combines it with the class-agnostic average pooling feature.
Bi-Modal Learning With Channel-Wise Attention for Multi-Label Image Classification
TLDR
This paper proposes a novel CNN-RNN-based model, bi-modal multi-label learning (BMML) framework, and based on the assumption that objects in a semantic scene always have high-level relevance among visual and textual corpus, embed the labels through different pre-trained language models and determine the label sequence in a “semantic space” constructed on large-scale textual data.
Knowledge-Guided Multi-Label Few-Shot Learning for General Image Recognition
TLDR
A knowledge-guided graph routing (KGGR) framework, which unifies prior knowledge of statistical label correlations with deep neural networks and can facilitate exploiting the information of correlated labels to help train better classifiers, especially for labels with limited training samples.
Multi-layered Semantic Representation Network for Multi-label Image Classification
TLDR
A Multi-layered Semantic Representation Network (MSRN) is designed which discovers both local and global semantics of labels through modeling label correlations and utilizes the label semantics to guide the semantic representations learning at multiple layers through an attention mechanism.
...
...

References

SHOWING 1-10 OF 33 REFERENCES
Multilabel Image Classification With Regional Latent Semantic Dependencies
TLDR
The proposed RLSD achieves the best performance compared to the state-of-the-art models, especially for predicting small objects occurring in the images, and can approach the upper bound without using the bounding-box annotations, which is more realistic in the real world.
CNN-RNN: A Unified Framework for Multi-label Image Classification
TLDR
The proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image- label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework.
Exploit Bounding Box Annotations for Multi-Label Object Recognition
TLDR
This paper first extracts object proposals from each image, then proposes to make use of ground-truth bounding box annotations (strong labels) to add another level of local information by using nearest-neighbor relationships of local regions to form a multi-view pipeline.
Correlative multi-label multi-instance image annotation
TLDR
A novel method is developed for achieving multi-label multi-instance image annotation, where image-level (bag-level) labels and region- level (instance- level) labels are both obtained and the associations between semantic concepts and visual features are mined both at the image level and at the region level.
HCP: A Flexible CNN Framework for Multi-Label Image Classification
TLDR
Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts, where an arbitrary number of object segment hypotheses are taken as the inputs.
CNN Features Off-the-Shelf: An Astounding Baseline for Recognition
TLDR
A series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13 suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.
Multiple Object Recognition with Visual Attention
TLDR
The model is a deep recurrent neural network trained with reinforcement learning to attend to the most relevant regions of the input image and it is shown that the model learns to both localize and recognize multiple objects despite being given only class labels during training.
Very Deep Convolutional Networks for Large-Scale Image Recognition
TLDR
This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Hierarchical matching with side information for image classification
TLDR
A hierarchical matching framework with so-called side information for image classification based on bag-of-words representation and two exemplar algorithms based on two types of side information: object confidence map and visual saliency map, from object detection priors and within-image contexts respectively are designed.
...
...