Geometry Constrained Weakly Supervised Object Localization

@article{Lu2020GeometryCW,
  title={Geometry Constrained Weakly Supervised Object Localization},
  author={Weizeng Lu and Xi Jia and Weicheng Xie and Linlin Shen and Yicong Zhou and Jinming Duan},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.09727}
}
We propose a geometry constrained network, termed GC-Net, for weakly supervised object localization (WSOL). GC-Net consists of three modules: a detector, a generator and a classifier. The detector predicts the object location defined by a set of coefficients describing a geometric shape (i.e. ellipse or rectangle), which is geometrically constrained by the mask produced by the generator. The classifier takes the resulting masked images as input and performs two complementary classification… 
Learning a Weight Map for Weakly-Supervised Localization
TLDR
This work proposes to employ an image classifier f and to train a generative network g that outputs, given the input image, a per-pixel weight map that indicates the location of the object within the image.
Unveiling the Potential of Structure Preserving for Weakly Supervised Object Localization
TLDR
A two-stage approach, termed structure-preserving activation (SPA), toward fully leveraging the structure information incorporated in convolutional features for WSOL by utilizing the high-order self-correlation (HSC) to extract the inherent structural information retained in the learned model and then aggregate HSC of multiple points for precise object localization.
Weakly Supervised Object Localization as Domain Adaption
TLDR
A novel perspective is provided that models WSOL as a domain adaption (DA) task, where the score estimator trained on the source/image domain is tested on the tar-get/pixel domain to locate objects and a DA-WSOL pipeline is designed to better engage DA approaches into WSOL to enhance localization performance.
ViTOL: Vision Transformer for Weakly Supervised Object Localization
TLDR
The vision-based transformer for self-attention is leveraged and a patch-based attention dropout layer (p-ADL) is introduced to increase the coverage of the localization map and a gradient attention rollout mechanism to generate class-dependent attention maps are introduced.
CREAM: Weakly Supervised Object Localization via Class RE-Activation Mapping
TLDR
This paper empirically prove that this problem is associated with the mixup of the activation values between less discriminative foreground regions and the background, and proposes Class RE-Activation Mapping (CREAM), a novel clustering-based approach to boost theactivation values of the integral object regions.
Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets
TLDR
It is argued that WSOL task is ill-posed with only image-level labels, and a new evaluation protocol is proposed where full supervision is limited to only a small held-out set not overlapping with the test set.
Strengthen Learning Tolerance for Weakly Supervised Object Localization
TLDR
A novel framework to strengthen the learning tolerance, referred to as SLT-Net, for WSOL, is proposed that allows the localizer to make mistakes for classifying similar semantics so that it will not concentrate too much on the discriminative local regions.
Foreground Mining via Contrastive Guidance for Weakly Supervised Object Localization
TLDR
This paper proposes a novel WSOL framework that localizes the entire object to the right extent via contrastive learning and achieves the state-of-the-art performance on CUB-200-2011 and ImageNet benchmarks regarding Top-1 Loc, GT-Loc and MaxBoxAccV2.
Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization
TLDR
This work demonstrates that the misalignment suppresses the activation of CAM in areas that are less discriminative but belong to the target object, and proposes a method to align feature directions with a class-specific weight.
Shallow Feature Matters for Weakly Supervised Object Localization
TLDR
This paper proposes a simple but effective Shallow feature-aware Pseudo supervised Object Localization (SPOL) model for accurate WSOL, which makes the utmost of low-level features embedded in shallow layers and proposes a general class-agnostic segmentation model to achieve the accurate object mask.
...
...

References

SHOWING 1-10 OF 27 REFERENCES
Self-taught object localization with deep networks
This paper introduces self-taught object localization, a novel approach that leverages deep convolutional networks trained for whole-image recognition to localize objects in images without additional
Self-produced Guidance for Weakly-supervised Object Localization
TLDR
Self-produced Guidance (SPG) masks which separate the foreground i.e., the object of interest, from the background to provide the classification networks with spatial correlation information of pixels are proposed.
Adversarial Complementary Learning for Weakly Supervised Object Localization
TLDR
This work mathematically proves that class localization maps can be obtained by directly selecting the class-specific feature maps of the last convolutional layer, which paves a simple way to identify object regions and presents a simple network architecture including two parallel-classifiers for object localization.
DANet: Divergent Activation for Weakly Supervised Object Localization
TLDR
A divergent activation (DA) approach is proposed, and target at learning complementary and discriminative visual patterns for image classification and weakly supervised object localization from the perspective of discrepancy.
Weakly Supervised Localization and Learning with Generic Knowledge
TLDR
A conditional random field that starts from generic knowledge and then progressively adapts to the new class is proposed that allows training any state-of-the-art object detector in a weakly supervised fashion, although it would normally require object location annotations.
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization
TLDR
The key idea is to hide patches in a training image randomly, forcing the network to seek other relevant parts when the most discriminative part is hidden, which obtains superior performance compared to previous methods for weakly-supervised object localization on the ILSVRC dataset.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation
TLDR
It is shown that properly combining saliency and attention maps allows for reliable cues capable of significantly boosting the performance, and a simple yet powerful hierarchical approach to discover the class-agnostic salient regions, obtained using a salient object detector, is proposed.
You Only Look Once: Unified, Real-Time Object Detection
TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
SSD: Single Shot MultiBox Detector
TLDR
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
...
...