• Publications
  • Influence
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
TLDR
Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train. Expand
Big Transfer (BiT): General Visual Representation Learning
TLDR
By combining a few carefully selected components, and transferring using a simple heuristic, Big Transfer achieves strong performance on over 20 datasets and performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. Expand
S4L: Self-Supervised Semi-Supervised Learning
TLDR
It is shown that S4L and existing semi-supervised methods can be jointly trained, yielding a new state-of-the-art result on semi- supervised ILSVRC-2012 with 10% of labels. Expand
Revisiting Self-Supervised Visual Representation Learning
TLDR
This study revisits numerous previously proposed self-supervised models, conducts a thorough large scale study and uncovers multiple crucial insights about standard recipes for CNN design that do not always translate to self- supervised representation learning. Expand
Self-Supervised GANs via Auxiliary Rotation Loss
TLDR
This work allows the networks to collaborate on the task of representation learning, while being adversarial with respect to the classic GAN game, and takes a step towards bridging the gap between conditional and unconditional GANs. Expand
MLP-Mixer: An all-MLP Architecture for Vision
TLDR
It is shown that while convolutions and attention are both sufficient for good performance, neither of them are necessary, and MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs), attains competitive scores on image classification benchmarks. Expand
Learning Cross-Media Joint Representation With Sparse and Semisupervised Regularization
TLDR
A novel feature learning algorithm for cross-media data, called joint representation learning (JRL), which is able to explore jointly the correlation and semantic information in a unified optimization framework and can not only reduce the dimension of the original features, but also incorporate theCross-media correlation into the final representation, which further improves the performance of both cross- media retrieval and single-media retrieval. Expand
The GAN Landscape: Losses, Architectures, Regularization, and Normalization
TLDR
This work reproduces the current state of the art of GANs from a practical perspective, discusses common pitfalls and reproducibility issues, and goes beyond fairly exploring the GAN landscape. Expand
Are we done with ImageNet?
TLDR
A significantly more robust procedure for collecting human annotations of the ImageNet validation set is developed, which finds the original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end. Expand
Semi-Supervised Cross-Media Feature Learning With Unified Patch Graph Regularization
TLDR
A semi-supervised cross-media feature learning algorithm with unified patch graph regularization (S2UPG) that fully exploits cross- media unlabeled instances and their patches, which can increase the diversity of training data and boost the accuracy of cross- Media retrieval. Expand
...
1
2
3
4
5
...