Geometry Constrained Weakly Supervised Object Localization

@article{Lu2020GeometryCW,
  title={Geometry Constrained Weakly Supervised Object Localization},
  author={Weizeng Lu and Xi Jia and Weicheng Xie and Linlin Shen and Yicong Zhou and Jinming Duan},
  journal={ArXiv},
  year={2020},
  volume={abs/2007.09727}
}
We propose a geometry constrained network, termed GC-Net, for weakly supervised object localization (WSOL). GC-Net consists of three modules: a detector, a generator and a classifier. The detector predicts the object location defined by a set of coefficients describing a geometric shape (i.e. ellipse or rectangle), which is geometrically constrained by the mask produced by the generator. The classifier takes the resulting masked images as input and performs two complementary classification… 
Learning a Weight Map for Weakly-Supervised Localization
TLDR
This work proposes to employ an image classifier f and to train a generative network g that outputs, given the input image, a per-pixel weight map that indicates the location of the object within the image.
ViTOL: Vision Transformer for Weakly Supervised Object Localization
TLDR
The vision-based transformer for self-attention is leveraged and a patch-based attention dropout layer (p-ADL) is introduced to increase the coverage of the localization map and a gradient attention rollout mechanism to generate class-dependent attention maps are introduced.
Evaluation for Weakly Supervised Object Localization: Protocol, Metrics, and Datasets
TLDR
It is argued that WSOL task is ill-posed with only image-level labels, and a new evaluation protocol is proposed where full supervision is limited to only a small held-out set not overlapping with the test set.
Strengthen Learning Tolerance for Weakly Supervised Object Localization
TLDR
A novel framework to strengthen the learning tolerance, referred to as SLT-Net, for WSOL, is proposed that allows the localizer to make mistakes for classifying similar semantics so that it will not concentrate too much on the discriminative local regions.
Foreground Mining via Contrastive Guidance for Weakly Supervised Object Localization
TLDR
This paper proposes a novel WSOL framework that localizes the entire object to the right extent via contrastive learning and achieves the state-of-the-art performance on CUB-200-2011 and ImageNet benchmarks regarding Top-1 Loc, GT-Loc and MaxBoxAccV2.
Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization
TLDR
This work demonstrates that the misalignment suppresses the activation of CAM in areas that are less discriminative but belong to the target object, and proposes a method to align feature directions with a class-specific weight.
Shallow Feature Matters for Weakly Supervised Object Localization
TLDR
This paper proposes a simple but effective Shallow feature-aware Pseudo supervised Object Localization (SPOL) model for accurate WSOL, which makes the utmost of low-level features embedded in shallow layers and proposes a general class-agnostic segmentation model to achieve the accurate object mask.
Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection
TLDR
This work proposes to learn pixel-wise cross- domain correspondences for more precise knowledge transfer through a novel cross-domain co-attention scheme trained as region competition, and achieves consistent improvements over existing approaches by a considerable margin.
Background Activation Suppression for Weakly Supervised Object Localization
TLDR
A Background Activation Suppression (BAS) method, designed to facilitate the learning of generator by suppressing the background activation value, which achieves signif-icant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets.
Background-aware Classification Activation Map for Weakly Supervised Object Localization
TLDR
The background-aware classification activation map (B-CAM) is proposed to simultaneously learn localization scores of both object and background with only image-level labels to improve the objects localization and suppresses the background activation.
...
1
2
3
...

References

SHOWING 1-10 OF 27 REFERENCES
Self-taught object localization with deep networks
This paper introduces self-taught object localization, a novel approach that leverages deep convolutional networks trained for whole-image recognition to localize objects in images without additional
Self-produced Guidance for Weakly-supervised Object Localization
TLDR
Self-produced Guidance (SPG) masks which separate the foreground i.e., the object of interest, from the background to provide the classification networks with spatial correlation information of pixels are proposed.
Adversarial Complementary Learning for Weakly Supervised Object Localization
TLDR
This work mathematically proves that class localization maps can be obtained by directly selecting the class-specific feature maps of the last convolutional layer, which paves a simple way to identify object regions and presents a simple network architecture including two parallel-classifiers for object localization.
DANet: Divergent Activation for Weakly Supervised Object Localization
TLDR
A divergent activation (DA) approach is proposed, and target at learning complementary and discriminative visual patterns for image classification and weakly supervised object localization from the perspective of discrepancy.
Weakly Supervised Localization and Learning with Generic Knowledge
TLDR
A conditional random field that starts from generic knowledge and then progressively adapts to the new class is proposed that allows training any state-of-the-art object detector in a weakly supervised fashion, although it would normally require object location annotations.
Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization
TLDR
The key idea is to hide patches in a training image randomly, forcing the network to seek other relevant parts when the most discriminative part is hidden, which obtains superior performance compared to previous methods for weakly-supervised object localization on the ILSVRC dataset.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TLDR
This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation
TLDR
It is shown that properly combining saliency and attention maps allows for reliable cues capable of significantly boosting the performance, and a simple yet powerful hierarchical approach to discover the class-agnostic salient regions, obtained using a salient object detector, is proposed.
You Only Look Once: Unified, Real-Time Object Detection
TLDR
Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
SSD: Single Shot MultiBox Detector
TLDR
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.
...
1
2
3
...