Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective

@article{Cui2022DiscriminabilityTransferabilityTA,
  title={Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective},
  author={Quan Cui and Bingchen Zhao and Zhao-Min Chen and Borui Zhao and Renjie Song and Jiajun Liang and Boyan Zhou and Osamu Yoshie},
  journal={ArXiv},
  year={2022},
  volume={abs/2203.03871}
}
This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task, i.e., image classification. By a comprehensive temporal analysis, we observe a tradeoff between these two properties. The discriminability keeps increasing with the training progressing while the transferability intensely diminishes in the later training period. From the perspective of information-bottleneck theory, we reveal that the… 

Figures and Tables from this paper

Self-Supervised Visual Representation Learning with Semantic Grouping
TLDR
This paper proposes contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning, and effectively decomposes complex scenes into semantic groups for feature learning and downstream tasks, including object detection, instance segmentation, and semantic segmentation.

References

SHOWING 1-10 OF 50 REFERENCES
What makes for good views for contrastive learning
TLDR
This paper uses empirical analysis to better understand the importance of view selection, and argues that the mutual information (MI) between views should be reduced while keeping task-relevant information intact, and devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.
Supervised Contrastive Learning
TLDR
A novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations is proposed, and the batch contrastive loss is modified, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting.
Learning Transferable Features with Deep Adaptation Networks
TLDR
A new Deep Adaptation Network (DAN) architecture is proposed, which generalizes deep convolutional neural network to the domain adaptation scenario and can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding.
Decoupling Representation and Classifier for Long-Tailed Recognition
TLDR
It is shown that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification.
Learning deep representations by mutual information estimation and maximization
TLDR
It is shown that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks and is an important step towards flexible formulations of representation learning objectives for specific end-goals.
A Simple Framework for Contrastive Learning of Visual Representations
TLDR
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Dense Contrastive Learning for Self-Supervised Visual Pre-Training
TLDR
DenseCL is presented, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images and outperforms the state-of-the-art methods by a large margin.
BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition
TLDR
A unified Bilateral-Branch Network (BBN) is proposed to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately.
Squeeze-and-Excitation Networks
TLDR
This work proposes a novel architectural unit, which is term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and shows that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets.
Contrastive Learning for Unpaired Image-to-Image Translation
TLDR
The framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time, and can be extended to the training setting where each "domain" is only a single image.
...
...