• Corpus ID: 247158785

Understanding Contrastive Learning Requires Incorporating Inductive Biases

@inproceedings{Saunshi2022UnderstandingCL,
  title={Understanding Contrastive Learning Requires Incorporating Inductive Biases},
  author={Nikunj Saunshi and Jordan T. Ash and Surbhi Goel and Dipendra Kumar Misra and Cyril Zhang and Sanjeev Arora and Sham M. Kakade and Akshay Krishnamurthy},
  booktitle={ICML},
  year={2022}
}
Contrastive learning is a popular form of self-supervised learning that encourages augmentations (views) of the same input to have more similar representations compared to augmentations of different inputs. Recent attempts to theoretically explain the success of contrastive learning on downstream classification tasks prove guarantees depending on properties of augmentations and the value of contrastive loss of representations. We demonstrate that such analyses, that ignore inductive biases of… 
Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning
TLDR
It is suggested that models with lots of parameters can be regarded as a brute-force way to find these local optima induced by nonlinearity, a possible underlying reason why empirical observations such as the lottery ticket hypothesis hold.
Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering
TLDR
The algorithmic pipeline in Orchestra guarantees good generalization performance under a linear probe, allowing it to outperform alternative techniques in a broad range of conditions, including variation in heterogeneity, number of clients, participation ratio, and local epochs.
Rethinking Positive Sampling for Contrastive Learning with Kernel
TLDR
This work proposes a new way to define positive samples using kernel theory along with a novel loss called decoupled uniformity, and draws a connection between contrastive learning and the conditional mean embedding theory to derive tight bounds on the downstream classification loss.
Analyzing Data-Centric Properties for Contrastive Learning on Graphs
TLDR
This work rigorously contextualizes the effects of data-centric properties on augmentation strategies and learning paradigms for graph SSL, and sees that CAAs induce better invariance and separability than GGAs in this setting.
Do More Negative Samples Necessarily Hurt in Contrastive Learning?
TLDR
It is shown in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class, that the downstream performance of the representation optimizing the population contrastive loss in fact does not degrade with the number of negative samples.

References

SHOWING 1-10 OF 49 REFERENCES
Contrastive Learning Inverts the Data Generating Process
TLDR
It is proved that feed-forward models trained with objectives belonging to the commonly used InfoNCE family learn to implicitly invert the underlying generative model of the observed data.
What makes for good views for contrastive learning
TLDR
This paper uses empirical analysis to better understand the importance of view selection, and argues that the mutual information (MI) between views should be reduced while keeping task-relevant information intact, and devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.
Investigating the Role of Negatives in Contrastive Representation Learning
TLDR
Theoretically, the existence of a collision-coverage trade-off is shown suggesting that the optimal number of negative examples should scale with the number of underlying concepts in the data, and empirical results find that the results broadly agree with theory and suggest future directions to better align theory and practice.
A Theoretical Analysis of Contrastive Unsupervised Representation Learning
TLDR
This framework allows us to show provable guarantees on the performance of the learned representations on the average classification task that is comprised of a subset of the same set of latent classes and shows that learned representations can reduce (labeled) sample complexity on downstream tasks.
Can contrastive learning avoid shortcut solutions?
The generalization of representations learned via contrastive learning depends crucially on what features of the data are extracted. However, we observe that the contrastive loss does not always
Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning
TLDR
It is proved that contrastive learning using ReLU networks provably learns the desired sparse features if proper augmentations are adopted, and an underlying principle called feature decoupling is presented to explain the effects of augmentations.
A Simple Framework for Contrastive Learning of Visual Representations
TLDR
It is shown that composition of data augmentations plays a critical role in defining effective predictive tasks, and introducing a learnable nonlinear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations, and contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning.
Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss
TLDR
This work proposes a loss that performs spectral decomposition on the population augmentation graph and can be succinctly written as a contrastive learning objective on neural net representations, leading to features with provable accuracy guarantees under linear probe evaluation.
Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere
TLDR
This work identifies two key properties related to the contrastive loss: alignment (closeness) of features from positive pairs, and uniformity of the induced distribution of the (normalized) features on the hypersphere.
Intriguing Properties of Contrastive Losses
TLDR
This work generalizes the standard contrastive loss based on cross entropy to a broader family of losses that share an abstract form of L, where hidden representations are encouraged to be aligned under some transformations/augmentations, and match a prior distribution of high entropy, and studies an intriguing phenomenon of feature suppression among competing features shared acros augmented views.
...
...