Multi-task Self-Supervised Visual Learning

@article{Doersch2017MultitaskSV,
  title={Multi-task Self-Supervised Visual Learning},
  author={Carl Doersch and Andrew Zisserman},
  journal={2017 IEEE International Conference on Computer Vision (ICCV)},
  year={2017},
  pages={2070-2079}
}
We investigate methods for combining multiple selfsupervised tasks—i.e., supervised tasks where data can be collected without manual labeling—in order to train a single visual representation. [] Key Method First, we provide an apples-toapples comparison of four different self-supervised tasks using the very deep ResNet-101 architecture. We then combine tasks to jointly train a network.
Representation Learning with Video Deep InfoMax
TLDR
This paper finds that drawing views from both natural-rate sequences and temporally-downsampled sequences yields results on Kinetics-pretrained action recognition tasks which match or outperform prior state-of-the-art methods that use more costly large-time-scale transformer models.
A Novel Multi-Task Self-Supervised Representation Learning Paradigm
TLDR
Experimental results revealed that proposed MTSS can achieve better performance and robustness than other self-supervised learning methods on multiple image classification data sets without using negative sample pairs and large batches.
MULTIPLE SELF-SUPERVISED AUXILIARY TASKS
  • Computer Science
  • 2020
TLDR
A general framework to improve graph-based neural network models by combining self-supervised auxiliary learning tasks in a multi-task fashion and using Graph Convolutional Networks as a building block to achieve competitive results on standard semi- supervised graph classification tasks is proposed.
Cross-Domain Self-Supervised Multi-task Feature Learning Using Synthetic Imagery
TLDR
A novel multi-task deep network to learn generalizable high-level visual representations based on adversarial learning is proposed and it is demonstrated that the network learns more transferable representations compared to single-task baselines.
Boosting Supervised Learning Performance with Co-training
TLDR
This paper introduces a simple and flexible multi-task co-training framework that integrates a self-supervised task into any supervised task and demonstrates strong domain adaption capability when used with additional unlabeled data.
Self-labelling via simultaneous clustering and representation learning
TLDR
The proposed novel and principled learning formulation is able to self-label visual data so as to train highly competitive image representations without manual labels and yields the first self-supervised AlexNet that outperforms the supervised Pascal VOC detection baseline.
Scaling and Benchmarking Self-Supervised Visual Representation Learning
TLDR
It is shown that by scaling on various axes (including data size and problem 'hardness'), one can largely match or even exceed the performance of supervised pre-training on a variety of tasks such as object detection, surface normal estimation and visual navigation using reinforcement learning.
Metric-Based Regularization and Temporal Ensemble for Multi-Task Learning using Heterogeneous Unsupervised Tasks
TLDR
Experimental results for three target tasks such as classification, object detection and embedding clustering prove that the TTE based multi-task framework is more effective than the state-of-the-art (SOTA) method in improving the performance of a target task.
Multi-Task Learning for Dense Prediction Tasks: A Survey
TLDR
This survey provides a well-rounded view on state-of-the-art deep learning approaches for MTL in computer vision, explicitly emphasizing on dense prediction tasks.
Revisiting Self-Supervised Visual Representation Learning
TLDR
This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning.
...
...

References

SHOWING 1-10 OF 52 REFERENCES
Cross-Stitch Networks for Multi-task Learning
TLDR
This paper proposes a principled approach to learn shared representations in Convolutional Networks using multitask learning using a new sharing unit: "cross-stitch" unit that combines the activations from multiple networks and can be trained end-to-end.
Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification
TLDR
This paper forms an approach for learning a visual representation from the raw spatiotemporal signals in videos using a Convolutional Neural Network, and shows that this method captures information that is temporally varying, such as human pose.
Unsupervised Visual Representation Learning by Context Prediction
TLDR
It is demonstrated that the feature representation learned using this within-image context indeed captures visual similarity across images and allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset.
UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory
  • Iasonas Kokkinos
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
In this work we train in an end-to-end manner a convolutional neural network (CNN) that jointly handles low-, mid-, and high-level vision tasks in a unified architecture. Such a network can act like
Unsupervised Learning of Visual Representations Using Videos
  • X. Wang, A. Gupta
  • Computer Science
    2015 IEEE International Conference on Computer Vision (ICCV)
  • 2015
TLDR
A simple yet surprisingly powerful approach for unsupervised learning of CNN that uses hundreds of thousands of unlabeled videos from the web to learn visual representations and designs a Siamese-triplet network with a ranking loss function to train this CNN representation.
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
TLDR
A novel unsupervised learning approach to build features suitable for object detection and classification and to facilitate the transfer of features to other tasks, the context-free network (CFN), a siamese-ennead convolutional neural network is introduced.
Context Encoders: Feature Learning by Inpainting
TLDR
It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods.
Self-Supervised Video Representation Learning with Odd-One-Out Networks
TLDR
A new self-supervised CNN pre-training technique based on a novel auxiliary task called odd-one-out learning, which learns temporal representations for videos that generalizes to other related tasks such as action recognition.
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
TLDR
This integrated framework for using Convolutional Networks for classification, localization and detection is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 and obtained very competitive results for the detection and classifications tasks.
Deeper Depth Prediction with Fully Convolutional Residual Networks
TLDR
A fully convolutional architecture, encompassing residual learning, to model the ambiguous mapping between monocular images and depth maps is proposed and a novel way to efficiently learn feature map up-sampling within the network is presented.
...
...