Corpus ID: 198230819

Attention Pooling A Softmax-Cross Entropy with true label � Distractor Patch

@inproceedings{Trinh2019AttentionPA,
  title={Attention Pooling A Softmax-Cross Entropy with true label � Distractor Patch},
  author={Trieu H. Trinh and Minh-Thang Luong},
  year={2019}
}
We introduce a pretraining technique called Selfie, which stands for SELFsupervised Image Embedding. Selfie generalizes the concept of masked language modeling to continuous data, such as images. Given masked-out patches in an input image, our method learns to select the correct patch, among other “distractor” patches sampled from the same image, to fill in the masked location. This classification objective sidesteps the need for predicting exact pixel values of the target patches. The… Expand

Figures and Tables from this paper

PATCHFORMER: A NEURAL ARCHITECTURE FOR SELF-SUPERVISED REPRESENTATION LEARNING ON IMAGES
  • 2019
Learning rich representations from predictive learning without labels has been a longstanding challenge in the field of machine learning. Generative pre-training has so far not been as successful asExpand
Towards a Hypothesis on Visual Transformation based Self-Supervision
TLDR
The first qualitative hypothesis characterizing the behavior of visual transformation based self-supervision, called the VTSS hypothesis, is proposed, which states that if the predicted instantiations of the transformations are already present in the dataset, then the representation learned will be less useful. Expand

References

SHOWING 1-10 OF 39 REFERENCES
Revisiting Self-Supervised Visual Representation Learning
TLDR
This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning. Expand
Context Encoders: Feature Learning by Inpainting
TLDR
It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods. Expand
Unsupervised Visual Representation Learning by Context Prediction
TLDR
It is demonstrated that the feature representation learned using this within-image context indeed captures visual similarity across images and allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset. Expand
Temporal Ensembling for Semi-Supervised Learning
TLDR
Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training. Expand
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion
TLDR
This work clearly establishes the value of using a denoising criterion as a tractable unsupervised objective to guide the learning of useful higher level representations. Expand
AutoAugment: Learning Augmentation Policies from Data
TLDR
This paper describes a simple procedure called AutoAugment to automatically search for improved data augmentation policies, which achieves state-of-the-art accuracy on CIFAR-10, CIFar-100, SVHN, and ImageNet (without additional data). Expand
Unsupervised Representation Learning by Predicting Image Rotations
TLDR
This work proposes to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input, and demonstrates both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning. Expand
Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning
TLDR
An unsupervised loss function is proposed that takes advantage of the stochastic nature of these methods and minimizes the difference between the predictions of multiple passes of a training sample through the network. Expand
Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
TLDR
A novel unsupervised learning approach to build features suitable for object detection and classification and to facilitate the transfer of features to other tasks, the context-free network (CFN), a siamese-ennead convolutional neural network is introduced. Expand
...
1
2
3
4
...