Probabilistic Representations for Video Contrastive Learning
@article{Park2022ProbabilisticRF, title={Probabilistic Representations for Video Contrastive Learning}, author={Jungin Park and Jiyoung Lee and Ig-Jae Kim and Kwanghoon Sohn}, journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2022}, pages={14691-14701} }
This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic representation. We hypothesize that the clips composing the video have different distributions in short-term duration, but can represent the complicated and sophisticated video distribution through combination in a common embedding space. Thus, the proposed method represents video clips as normal distributions and combines them into…
Figures and Tables from this paper
4 Citations
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
- Computer ScienceArXiv
- 2022
This work proposes a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE), and shows that SCE reaches state-of-the-art results for pretraining video representation and that the learned representation can generalize to video downstream tasks.
UATVR: Uncertainty-Adaptive Text-Video Retrieval
- Computer ScienceArXiv
- 2023
This paper proposes an Uncertainty-Adaptive Text-Video Retrieval approach, termed UATVR, which models each lookup as a distribution matching procedure, and adds additional learnable tokens in the encoders to adaptively aggregate multi-grained semantics for high-level reasoning.
Boosting Semi-Supervised Semantic Segmentation with Probabilistic Representations
- Computer ScienceArXiv
- 2022
A Probabilistic Representation Contrastive Learning (PRCL) framework is proposed that improves representation quality by taking its probability into consideration and can tune the contribution of the ambiguous representations to tolerate the risk of inaccurate pseudo-labels.
Representing Spatial Trajectories as Distributions
- Computer ScienceArXiv
- 2022
A representation learning framework for spatial trajectories that can accurately predict the past and future of a trajectory segment, as well as the interpolation between two different segments, outperforming autoregressive baselines and able to obtain samples from a trajectory for any continuous point in time.
References
SHOWING 1-10 OF 90 REFERENCES
Spatiotemporal Contrastive Video Representation Learning
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
This work proposes a temporally consistent spatial augmentation method to impose strong spatial augmentations on each frame of the video while maintaining the temporal consistency across frames, and proposes a sampling-based temporal augmentation methods to avoid overly enforcing invariance on clips that are distant in time.
Time-Equivariant Contrastive Video Representation Learning
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
A novel self-supervised contrastive learning method to learn representations from unlabelled videos that are equivariant to temporal transformations and better capture video dynamics and achieve state-of-the-art results in video retrieval and action recognition benchmarks.
TCLR: Temporal Contrastive Learning for Video Representation
- Computer ScienceComput. Vis. Image Underst.
- 2022
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
This paper improves the temporal feature representations of MoCo from two perspectives and improves MoCo temporally based on contrastive learning by empowering the temporal robustness of the encoder and modeling the temporal decay of the keys.
Active Contrastive Learning of Audio-Visual Video Representations
- Computer ScienceICLR
- 2021
An active contrastive learning approach that builds an actively sampled dictionary with diverse and informative items, which improves the quality of negative samples and improves performances on tasks where there is high mutual information in the data, e.g., video classification.
Contrast and Order Representations for Video Self-supervised Learning
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
A contrast-and-order representation (CORP) framework for learning self-supervised video representations that can automatically capture both the appearance information within each frame and temporal information across different frames and a novel decoupling attention method to learn symmetric similarity (contrast) and anti-symmetric patterns.
Composable Augmentation Encoding for Video Representation Learning
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
It is shown that representations learned by the proposed 'augmentation aware' contrastive learning framework encode valuable information about specified spatial or temporal augmentation, and in doing so also achieve state-of-the-art performance on a number of video benchmarks.
Motion-Focused Contrastive Learning of Video Representations*
- Computer Science2021 IEEE/CVF International Conference on Computer Vision (ICCV)
- 2021
A Motion-focused Contrastive Learning method that capitalizes on optical flow of each frame in a video to temporally and spatially sample the tubelets as data augmentations and aligns gradient maps of the convolutional layers to optical flow maps from spatial, temporal and spatio-temporal perspectives, in order to ground motion information in feature learning.
Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion
- Computer ScienceAAAI
- 2021
This work proposes to decouple the scene and the motion (DSM) with two simple operations, so that the model attention towards the motion information is better paid and the impact of the scene is weakened while the temporal sensitivity of the network is further enhanced.
Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
Geometry is explored, a grand new type of auxiliary supervision for the self-supervised learning of video representations, and it is found that the convolutional neural networks pre-trained by the geometry cues can be effectively adapted to semantic video understanding tasks.