Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective

  title={Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective},
  author={Quan Cui and Bingchen Zhao and Zhao-Min Chen and Borui Zhao and Renjie Song and Jiajun Liang and Boyan Zhou and Osamu Yoshie},
. This work simultaneously considers the discriminability and transferability properties of deep representations in the typical supervised learning task, i.e. , image classification. By a comprehensive temporal analysis, we observe a trade-off between these two properties. The discriminability keeps increasing with the training progressing while the transferability intensely diminishes in the later training period. From the perspective of information-bottleneck theory, we reveal that the… 
1 Citations

Figures and Tables from this paper

OOD-CV: A Benchmark for Robustness to Out-of-Distribution Shifts of Individual Nuisances in Natural Images
OOD-CV is introduced, a benchmark dataset that includes out-of-distribution examples of 10 object categories in terms of pose, shape, texture, context and the weather conditions, and enables benchmarking models for image classification, object detection, and 3D pose estimation.
Self-Supervised Visual Representation Learning with Semantic Grouping
This paper proposes contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning, and effectively decomposes complex scenes into semantic groups for feature learning and downstream tasks, including object detection, instance segmentation, and semantic segmentation.


Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation
Batch Spectral Penalization (BSP), a general approach to penalizing the largest singular values so that other eigenvectors can be relatively strengthened to boost the feature discriminability, is presented.
Harmonizing Transferability and Discriminability for Adapting Object Detectors
A Hierarchical Transferability Calibration Network (HTCN) that hierarchically (local-region/image/instance) calibrates the transferability of feature representations for harmonizing transferability and discriminability is proposed.
What makes for good views for contrastive learning
This paper uses empirical analysis to better understand the importance of view selection, and argues that the mutual information (MI) between views should be reduced while keeping task-relevant information intact, and devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.
Why Do Better Loss Functions Lead to Less Transferable Features?
It is shown that many objectives lead to statistically significant improvements in ImageNet accuracy over vanilla softmax cross-entropy, but the resulting fixed feature extractors transfer substantially worse to downstream tasks, and the choice of loss has little effect when networks are fully fine-tuned on the new tasks.
Supervised Contrastive Learning
A novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations is proposed, and the batch contrastive loss is modified, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting.
Decoupling Representation and Classifier for Long-Tailed Recognition
It is shown that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification.
Learning Transferable Features with Deep Adaptation Networks
A new Deep Adaptation Network (DAN) architecture is proposed, which generalizes deep convolutional neural network to the domain adaptation scenario and can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding.
Rethinking supervised pre-training for better downstream transferring
This paper proposes a new supervised pre-training method based on Leave-One-Out K-Nearest-Neighbor, or LOOK, which relieves the problem of overfitting upstream tasks by only requiring each image to share its class label with most of its k nearest neighbors, thus allowing each class to exhibit a multi-mode distribution and consequentially preserving part of intra-class difference for better transferring to downstream tasks.
Learning deep representations by mutual information estimation and maximization
It is shown that structure matters: incorporating knowledge about locality in the input into the objective can significantly improve a representation’s suitability for downstream tasks and is an important step towards flexible formulations of representation learning objectives for specific end-goals.
A Simple Framework for Contrastive Learning of Visual Representations
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.