Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences

@article{Jing2021SelfsupervisedFL,
  title={Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences},
  author={Longlong Jing and Yucheng Chen and Ling Zhang and Mingyi He and Yingli Tian},
  journal={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  year={2021},
  pages={1581-1891}
}
The success of supervised learning requires large- scale ground truth labels which are very expensive, time- consuming, or may need special skills to annotate. To address this issue, many self- or un-supervised methods are developed. Unlike most existing self-supervised methods to learn only 2D image features or only 3D point cloud features, this paper presents a novel and effective self-supervised learning approach to jointly learn both 2D image features and 3D point cloud features by… Expand
Self-supervised Modal and View Invariant Feature Learning
TLDR
In order to learn modal- and view-invariant features from different modalities including image, point cloud, and mesh with heterogeneous networks, two types of constraints are proposed: cross-modal invariance constraint and cross-view invariant constraint. Expand
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
TLDR
An extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos as a subset of unsupervised learning methods to learn general image and video features from large-scale unlabeled data without using any human-annotated labels is provided. Expand
Cross-modal Center Loss
TLDR
This paper proposes an approach to jointly train the components of cross-modal retrieval framework with metadata, and enable the network to find optimal features, and minimizes the distances of features from objects belonging to the same class across all modalities. Expand
Pluggable Weakly-Supervised Cross-View Learning for Accurate Vehicle Re-Identification
TLDR
This study presents a pluggable Weakly-supervised Cross-View Learning (WCVL) module that can be seamlessly plugged into most existing vehicle ReID baselines for cross-view learning without re-training the baselines and demonstrates its efficacy. Expand

References

SHOWING 1-10 OF 64 REFERENCES
Self-supervised Modal and View Invariant Feature Learning
TLDR
In order to learn modal- and view-invariant features from different modalities including image, point cloud, and mesh with heterogeneous networks, two types of constraints are proposed: cross-modal invariance constraint and cross-view invariant constraint. Expand
Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey
TLDR
An extensive review of deep learning-based self-supervised general visual feature learning methods from images or videos as a subset of unsupervised learning methods to learn general image and video features from large-scale unlabeled data without using any human-annotated labels is provided. Expand
Self-supervised Spatiotemporal Feature Learning by Video Geometric Transformations
TLDR
A novel 3DConvNet-based fully selfsupervised framework to learn spatiotemporal video features without using any human-labeled annotations and outperforms the state-of-the-arts of fully self-supervised methods on both UCF101 and HMDB51 datasets and achieves 62.9% and 33.7% accuracy respectively. Expand
Self-Supervised Spatiotemporal Feature Learning via Video Rotation Prediction.
TLDR
With the self-supervised pre-trained 3DRotNet from large datasets, the recognition accuracy is boosted up by 20.4% on UCF101 and 16.7% on HMDB51 respectively, compared to the models trained from scratch. Expand
Scaling and Benchmarking Self-Supervised Visual Representation Learning
TLDR
It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning. Expand
Unsupervised Representation Learning by Predicting Image Rotations
TLDR
This work proposes to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input, and demonstrates both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning. Expand
Self-Supervised Deep Learning on Point Clouds by Reconstructing Space
TLDR
This work proposes a self-supervised learning task for deep learning on raw point cloud data in which a neural network is trained to reconstruct point clouds whose parts have been randomly rearranged, and demonstrates that pre-training with this method before supervised training improves the performance of state-of-the-art models and significantly improves sample efficiency. Expand
Revisiting Self-Supervised Visual Representation Learning
TLDR
This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning. Expand
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
TLDR
A novel unsupervised learning approach to build features suitable for object detection and classification and to facilitate the transfer of features to other tasks, the context-free network (CFN), a siamese-ennead convolutional neural network is introduced. Expand
Context Encoders: Feature Learning by Inpainting
TLDR
It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods. Expand
...
1
2
3
4
5
...