Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training of Image Segmentation Models

@article{Li2022DistillingEO,
  title={Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training of Image Segmentation Models},
  author={Xuhong Li and Haoyi Xiong and Yi Liu and Dingfu Zhou and Zeyu Chen and Yaqing Wang and Dejing Dou},
  journal={ArXiv},
  year={2022},
  volume={abs/2207.03335}
}
While fine-tuning pre-trained networks has become a popular way to train image segmentation models, such backbone networks for image segmentation are frequently pre-trained using image classification source datasets, e.g., ImageNet. Though image classification datasets could provide the backbone networks with rich visual features and discriminative ability, they are incapable of fully pre-training the target model (i.e., backbone+segmentation modules) in an end-to-end manner. The segmentation… 

References

SHOWING 1-10 OF 39 REFERENCES

Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach

This work harnesses the image-level labels to produce reliable pixel-level annotations and design a fully end-to-end network to learn to predict segmentation maps and gets a new state-of-the-art performance on the Pascal VOC.

Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation

Expectation-Maximization (EM) methods for semantic image segmentation model training under weakly supervised and semi-supervised settings are developed and extensive experimental evaluation shows that the proposed techniques can learn models delivering competitive results on the challenging PASCAL VOC 2012 image segmentsation benchmark, while requiring significantly less annotation effort.

FickleNet: Weakly and Semi-Supervised Semantic Image Segmentation Using Stochastic Inference

FickleNet explores diverse combinations of locations on feature maps created by generic deep neural networks and implicitly learns the coherence of each location in the feature maps, resulting in a localization map which identifies both discriminative and other parts of objects.

Rethinking Pre-training and Self-training

Self-training works well exactly on the same setup that pre-training does not work (using ImageNet to help COCO), and on the PASCAL segmentation dataset, though pre- training does help significantly, self-training improves upon the pre-trained model.

Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation

It is found that varying dilation rates can effectively enlarge the receptive fields of convolutional kernels and more importantly transfer the surrounding discriminative information to non-discriminative object regions, promoting the emergence of these regions in the object localization maps.

Conditional Random Fields as Recurrent Neural Networks

A new form of convolutional neural network that combines the strengths of Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs)-based probabilistic graphical modelling is introduced, and top results are obtained on the challenging Pascal VOC 2012 segmentation benchmark.

Loss Max-Pooling for Semantic Image Segmentation

A novel loss max-pooling concept for handling imbalanced training data distributions, applicable as alternative loss layer in the context of deep neural networks for semantic image segmentation, and adaptively re-weights the contributions of each pixel based on their observed losses.

This looks like that: deep learning for interpretable image recognition

A deep network architecture -- prototypical part network (ProtoPNet), that reasons in a similar way to the way ornithologists, physicians, and others would explain to people on how to solve challenging image classification tasks, that provides a level of interpretability that is absent in other interpretable deep models.

Cross-Model Consensus of Explanations and Beyond for Image Classification Models: An Empirical Study

An interpretation algorithm is used to attribute the importance of features (e.g., pixels or superpixels) as explanations, and a cross-model consensus of explanations is proposed to capture the common features.

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization

This work proposes a technique for producing ‘visual explanations’ for decisions from a large class of Convolutional Neural Network (CNN)-based models, making them more transparent and explainable, and shows that even non-attention based models learn to localize discriminative regions of input image.