• Corpus ID: 231918497

Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning

@inproceedings{Zhang2021UnleashingTP,
  title={Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning},
  author={Yifan Zhang and Bryan Hooi and D. Hu and Jian Liang and Jiashi Feng},
  booktitle={NeurIPS},
  year={2021}
}
Contrastive self-supervised learning (CSL) has attracted increasing attention for model pre-training via unlabeled data. The resulted CSL models provide instancediscriminative visual features that are uniformly scattered in the feature space. During deployment, the common practice is to directly fine-tune CSL models with cross-entropy, which however may not be the best strategy in practice. Although cross-entropy tends to separate inter-class features, the resulting models still have limited… 
Multi-Modal Mixup for Robust Fine-tuning
TLDR
A new end-to-end fine-tuning method for robust representation that encourages better uniformity and alignment score and finetune the multi-modal model on a hard negative sample as well as normal negative and positive samples with contrastive learning is provided.
Neighborhood Consensus Contrastive Learning for Backward-Compatible Representation
TLDR
A Neighborhood Consensus Contrastive Learning (NCCL) method is proposed, which learns backward-compatible representation from a neighborhood consensus perspective with both embedding structures and discriminative knowledge, which ensures backward compatibility without impairing the accuracy of the new model.
Distance-based Hyperspherical Classification for Multi-source Open-Set Domain Adaptation
TLDR
This work tackles multi-source Open-Set domain adaptation by introducing HyMOS: a straightforward model that exploits the power of contrastive learning and the properties of its hyperspherical feature space to correctly predict known labels on the target, while rejecting samples belonging to any unknown class.
How Well Does Self-Supervised Pre-Training Perform with Streaming Data?
TLDR
This paper conducts the first thorough and dedicated investigation on self-supervised pretraining with streaming data, aiming to shed light on the model behavior under this overlooked setup, and suggests that, in practice, the cumbersome joint training can be replaced mainly by sequential learning.
How Well Self-Supervised Pre-Training Performs with Streaming Data?
TLDR
Surprisingly, sequential self-supervised learning exhibits almost the same performance as the joint training when the distribution shifts within streaming data are mild, and is recommended as a more efficient yet performance-competitive representation learning practice for real-world applications.
Deep Long-Tailed Learning: A Survey
TLDR
A comprehensive survey on recent advances in deep long-tailed learning is provided, highlighting important applications of deepLongtailed learning and identifying several promising directions for future research.
Debiased Visual Question Answering from Feature and Sample Perspectives
TLDR
A method named D-VQA is proposed to alleviate the above challenges from the feature and sample perspectives, which applies two unimodal bias detection modules to explicitly recognise and remove the negative biases in language and vision modalities.
CrossCBR: Cross-view Contrastive Learning for Bundle Recommendation
TLDR
This work proposes to model the cooperative association between the two different views through cross-view contrastive learning by encouraging the alignment of the two separately learned views, so that each view can distill complementary information from the other view, achieving mutual enhancement.
Contrastive Learning for Cross-Domain Open World Recognition
TLDR
This work proposes the first learning approach that deals with all the previously mentioned challenges at once by exploiting a single contrastive objective and shows how it learns a feature space perfectly suitable to incrementally include new classes and is able to capture knowledge which generalizes across a variety of visual domains.
Boost Test-Time Performance with Closed-Loop Inference
TLDR
A general Closed-Loop Inference (CLI) method is proposed, which first devise a filtering criterion to identify those hard-classified test samples that need additional inference loops and construct looped inference, so that the original erroneous predictions on these hard test samples can be corrected with little additional effort.
...
...

References

SHOWING 1-10 OF 92 REFERENCES
Hard Negative Mixing for Contrastive Learning
TLDR
It is argued that an important aspect of contrastive learning, i.e., the effect of hard negatives, has so far been neglected and proposed hard negative mixing strategies at the feature level, that can be computed on-the-fly with a minimal computational overhead.
Dense Contrastive Learning for Self-Supervised Visual Pre-Training
TLDR
DenseCL is presented, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images and outperforms the state-of-the-art methods by a large margin.
Supervised Contrastive Learning
TLDR
A novel training methodology that consistently outperforms cross entropy on supervised learning tasks across different architectures and data augmentations is proposed, and the batch contrastive loss is modified, which has recently been shown to be very effective at learning powerful representations in the self-supervised setting.
Conditional Negative Sampling for Contrastive Learning of Visual Representations
TLDR
This paper introduces a family of mutual information estimators that sample negatives conditionally -- in a "ring" around each positive -- and proves that these estimators lower-bound mutual information, with higher bias but lower variance than NCE.
i-Mix: A Strategy for Regularizing Contrastive Representation Learning
TLDR
It is demonstrated that i-Mix consistently improves the quality of self-supervised representations across domains, resulting in significant performance gains on downstream tasks, and its regularization effect is confirmed via extensive ablation studies across model and dataset sizes.
A Simple Framework for Contrastive Learning of Visual Representations
TLDR
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
What makes for good views for contrastive learning
TLDR
This paper uses empirical analysis to better understand the importance of view selection, and argues that the mutual information (MI) between views should be reduced while keeping task-relevant information intact, and devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning
TLDR
This work proposes a supervised contrastive learning (SCL) objective for the fine-tuning stage of natural language understanding classification models and demonstrates that the new objective leads to models that are more robust to different levels of noise in the training data, and can generalize better to related tasks with limited labeled task data.
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
TLDR
This paper proposes an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons, and uses a swapped prediction mechanism where it predicts the cluster assignment of a view from the representation of another view.
ClusterFit: Improving Generalization of Visual Representations
TLDR
Clustering helps reduce the pre-training task-specific information from the extracted features thereby minimizing overfitting to the same, and significantly improves the representation quality compared to the state-of-the-art large-scale weakly-supervised image and video models and self- supervised image models.
...
...