Corpus ID: 232478851

Composable Augmentation Encoding for Video Representation Learning

@article{Sun2021ComposableAE,
  title={Composable Augmentation Encoding for Video Representation Learning},
  author={Chen Sun and Arsha Nagrani and Yonglong Tian and C. Schmid},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.00616}
}
We focus on contrastive methods for self-supervised video representation learning. A common paradigm in contrastive learning is to construct positive pairs by sampling different data views for the same instance, with different data instances as negatives. These methods implicitly assume a set of representational invariances to the view selection mechanism (eg, sampling frames with temporal shifts), which may lead to poor performance on downstream tasks which violate these invariances (fine… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 68 REFERENCES
What Should Not Be Contrastive in Contrastive Learning
Spatiotemporal Contrastive Video Representation Learning
Video Representation Learning by Dense Predictive Coding
Can Temporal Information Help with Contrastive Self-Supervised Learning?
Evolving Losses for Unsupervised Video Representation Learning
The Visual Task Adaptation Benchmark
Self-supervised Video Representation Learning by Pace Prediction
...
1
2
3
4
5
...