Corpus ID: 237263357

PASS: An ImageNet replacement for self-supervised pretraining without humans

  title={PASS: An ImageNet replacement for self-supervised pretraining without humans},
  author={Yuki M. Asano and C. Rupprecht and Andrew Zisserman and Andrea Vedaldi},
Computer vision has long relied on ImageNet and other large datasets of images sampled from the Internet for pretraining models. However, these datasets have ethical and technical shortcomings, such as containing personal information taken without consent, unclear license usage, biases, and, in some cases, even problematic image content. On the other hand, state-of-the-art pretraining is nowadays obtained with unsupervised methods, meaning that labelled datasets such as ImageNet may not be… Expand


Auditing ImageNet: Towards a Model-driven Framework for Annotating Demographic Attributes of Large-Scale Image Datasets
This work introduces a model-driven framework for the automatic annotation of apparent age and gender attributes in large-scale image datasets and presents this work as the starting point for future development of unbiased annotation models and for the study of downstream effects of imbalances in the demographics of ImageNet. Expand
Are we done with ImageNet?
A significantly more robust procedure for collecting human annotations of the ImageNet validation set is developed, which finds the original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end. Expand
Mean Shift for Self-Supervised Learning
A simple mean-shift algorithm that learns representations by grouping images together without contrasting between them or adopting much of prior on the structure of the clusters is introduced. Expand
Self-Supervised Pretraining Improves Self-Supervised Pretraining
H Hierarchical PreTraining (HPT) is explored, which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model, and provides a simple framework for obtaining better pretrained representations with less computational resources. Expand
From ImageNet to Image Classification: Contextualizing Progress on Benchmarks
This work uses human studies to investigate the consequences of employing a noisy data collection pipeline and study how specific design choices in the ImageNet creation process impact the fidelity of the resulting dataset---including the introduction of biases that state-of-the-art models exploit. Expand
Emerging Properties in Self-Supervised Vision Transformers
This paper questions if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets), and implements DINO, a form of self-distillation with no labels, which is implemented into a simple self- supervised method. Expand
Context Encoders: Feature Learning by Inpainting
It is found that a context encoder learns a representation that captures not just appearance but also the semantics of visual structures, and can be used for semantic inpainting tasks, either stand-alone or as initialization for non-parametric methods. Expand
ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models
A highly automated platform that enables gathering datasets with controls at scale using automated tools throughout machine learning to generate datasets that exercise models in new ways thus providing valuable feedback to researchers is developed. Expand
Self-labelling via simultaneous clustering and representation learning
The proposed novel and principled learning formulation is able to self-label visual data so as to train highly competitive image representations without manual labels and yields the first self-supervised AlexNet that outperforms the supervised Pascal VOC detection baseline. Expand
Learning Multiple Layers of Features from Tiny Images
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network. Expand