• Corpus ID: 231918497

Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning

@inproceedings{Zhang2021UnleashingTP,
  title={Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning},
  author={Yifan Zhang and Bryan Hooi and D. Hu and Jian Liang and Jiashi Feng},
  booktitle={NeurIPS},
  year={2021}
}
Contrastive self-supervised learning (CSL) has attracted increasing attention for model pre-training via unlabeled data. The resulted CSL models provide instancediscriminative visual features that are uniformly scattered in the feature space. During deployment, the common practice is to directly fine-tune CSL models with cross-entropy, which however may not be the best strategy in practice. Although cross-entropy tends to separate inter-class features, the resulting models still have limited… 
Multi-Modal Mixup for Robust Fine-tuning
TLDR
A new end-to-end fine-tuning method for robust representation that encourages better uniformity and alignment score and finetune the multi-modal model on a hard negative sample as well as normal negative and positive samples with contrastive learning is provided.
Neighborhood Consensus Contrastive Learning for Backward-Compatible Representation
TLDR
A Neighborhood Consensus Contrastive Learning (NCCL) method is proposed, which learns backward-compatible representation from a neighborhood consensus perspective with both embedding structures and discriminative knowledge, which ensures backward compatibility without impairing the accuracy of the new model.
Distance-based Hyperspherical Classification for Multi-source Open-Set Domain Adaptation
TLDR
This work tackles multi-source Open-Set domain adaptation by introducing HyMOS: a straightforward model that exploits the power of contrastive learning and the properties of its hyperspherical feature space to correctly predict known labels on the target, while rejecting samples belonging to any unknown class.
How Well Does Self-Supervised Pre-Training Perform with Streaming Data?
TLDR
This paper conducts the first thorough and dedicated investigation on self-supervised pretraining with streaming data, aiming to shed light on the model behavior under this overlooked setup, and suggests that, in practice, the cumbersome joint training can be replaced mainly by sequential learning.
How Well Self-Supervised Pre-Training Performs with Streaming Data?
TLDR
Surprisingly, sequential self-supervised learning exhibits almost the same performance as the joint training when the distribution shifts within streaming data are mild, and is recommended as a more efficient yet performance-competitive representation learning practice for real-world applications.
Deep Long-Tailed Learning: A Survey
TLDR
A comprehensive survey on recent advances in deep long-tailed learning is provided, highlighting important applications of deepLongtailed learning and identifying several promising directions for future research.
Debiased Visual Question Answering from Feature and Sample Perspectives
TLDR
A method named D-VQA is proposed to alleviate the above challenges from the feature and sample perspectives, which applies two unimodal bias detection modules to explicitly recognise and remove the negative biases in language and vision modalities.
Boost Test-Time Performance with Closed-Loop Inference
TLDR
A general Closed-Loop Inference (CLI) method is proposed, which first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops and construct looped inference, so that the original erroneous predictions on these hard test samples can be corrected with little additional effort.
Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision
TLDR
This work proposes a new method, called Test-time Aggregating Diverse Experts, that presents a new skill-diverse expert learning strategy that trains diverse experts to excel at handling different class distributions from a single long-tailed training distribution; and theoretically shows that the method has a provable ability to simulate the test class distribution.
Source-free Domain Adaptation via Avatar Prototype Generation and Adaptation
TLDR
A new robust contrastive prototype adaptation strategy to align each pseudo-labeled target data to the corresponding source prototypes via contrastive learning is developed.
...
1
2
...

References

SHOWING 1-10 OF 83 REFERENCES
Conditional Negative Sampling for Contrastive Learning of Visual Representations
TLDR
This paper introduces a family of mutual information estimators that sample negatives conditionally -- in a "ring" around each positive -- and proves that these estimators lower-bound mutual information, with higher bias but lower variance than NCE.
i-Mix: A Strategy for Regularizing Contrastive Representation Learning
TLDR
It is demonstrated that i-Mix consistently improves the quality of self-supervised representations across domains, resulting in significant performance gains on downstream tasks, and its regularization effect is confirmed via extensive ablation studies across model and dataset sizes.
ClusterFit: Improving Generalization of Visual Representations
TLDR
Clustering helps reduce the pre-training task-specific information from the extracted features thereby minimizing overfitting to the same, and significantly improves the representation quality compared to the state-of-the-art large-scale weakly-supervised image and video models and self- supervised image models.
Bi-tuning of Pre-trained Representations
TLDR
Bi-tuning is proposed, a general learning framework to fine- Tuning both supervised and unsupervised pre-trained representations to downstream tasks, which achieves state-of-the-art results for fine-tuned tasks of both supervised or unsuper supervised pre- trained models by large margins.
Emerging Properties in Self-Supervised Vision Transformers
TLDR
This paper questions if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets) and implements DINO, a form of self-distillation with no labels, which implements the synergy between DINO and ViTs.
Mix-and-Match Tuning for Self-Supervised Semantic Segmentation
TLDR
With the proposed M&M approach, for the first time, a self-supervision method can achieve comparable or even better performance compared to its ImageNet pre-trained counterpart on both PASCAL VOC2012 dataset and CityScapes dataset.
Unsupervised Feature Learning via Non-parametric Instance Discrimination
TLDR
This work forms this intuition as a non-parametric classification problem at the instance-level, and uses noise-contrastive estimation to tackle the computational challenges imposed by the large number of instance classes.
Large-Margin Softmax Loss for Convolutional Neural Networks
TLDR
A generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features and which not only can adjust the desired margin but also can avoid overfitting is proposed.
Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning
TLDR
An in-depth empirical investigation into negative transfer in fine-tuning is launched and it is found that, for the weight parameters and feature representations, transferability of their spectral components is diverse.
Explicit Inductive Bias for Transfer Learning with Convolutional Networks
TLDR
This paper investigates several regularization schemes that explicitly promote the similarity of the final solution with the initial model, and eventually recommends a simple $L^2$ penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks.
...
1
2
3
4
5
...