Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective

  title={Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective},
  author={Quan Cui and Bingchen Zhao and Zhao-Min Chen and Borui Zhao and Renjie Song and Jiajun Liang and Boyan Zhou and Osamu Yoshie},
This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task, i.e., image classification. By a comprehensive temporal analysis, we observe a tradeoff between these two properties. The discriminability keeps increasing with the training progressing while the transferability intensely diminishes in the later training period. From the perspective of information-bottleneck theory, we reveal that the… 

Figures and Tables from this paper

Self-Supervised Visual Representation Learning with Semantic Grouping
This paper proposes contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning, and effectively decomposes complex scenes into semantic groups for feature learning and downstream tasks, including object detection, instance segmentation, and semantic segmentation.


What makes for good views for contrastive learning
This paper uses empirical analysis to better understand the importance of view selection, and argues that the mutual information (MI) between views should be reduced while keeping task-relevant information intact, and devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.
Supervised Contrastive Learning
A novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations is proposed, and the batch contrastive loss is modified, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting.
Learning Transferable Features with Deep Adaptation Networks
A new Deep Adaptation Network (DAN) architecture is proposed, which generalizes deep convolutional neural network to the domain adaptation scenario and can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding.
Decoupling Representation and Classifier for Long-Tailed Recognition
It is shown that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification.
Learning deep representations by mutual information estimation and maximization
It is shown that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks and is an important step towards flexible formulations of representation learning objectives for specific end-goals.
A Simple Framework for Contrastive Learning of Visual Representations
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Dense Contrastive Learning for Self-Supervised Visual Pre-Training
DenseCL is presented, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images and outperforms the state-of-the-art methods by a large margin.
BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition
A unified Bilateral-Branch Network (BBN) is proposed to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately.
Squeeze-and-Excitation Networks
This work proposes a novel architectural unit, which is term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and shows that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets.
Contrastive Learning for Unpaired Image-to-Image Translation
The framework enables one-sided translation in the unpaired image-to-image translation setting, while improving quality and reducing training time, and can be extended to the training setting where each "domain" is only a single image.