Corpus ID: 189928444

Fixing the train-test resolution discrepancy

@inproceedings{Touvron2019FixingTT,
  title={Fixing the train-test resolution discrepancy},
  author={Hugo Touvron and Andrea Vedaldi and Matthijs Douze and Herv{\'e} J{\'e}gou},
  booktitle={NeurIPS},
  year={2019}
}
Data-augmentation is key to the training of neural networks for image classification. [...] Key Method It involves only a computationally cheap fine-tuning of the network at the test resolution. This enables training strong classifiers using small training images. For instance, we obtain 77.1% top-1 accuracy on ImageNet with a ResNet-50 trained on 128x128 images, and 79.8% with one trained on 224x224 image. In addition, if we use extra training data we get 82.5% with the ResNet-50 train with 224x224 images…Expand
High-Performance Large-Scale Image Recognition Without Normalization
TLDR
An adaptive gradient clipping technique is developed which overcomes instabilities in batch normalization, and a significantly improved class of Normalizer-Free ResNets is designed which attain significantly better performance when finetuning on ImageNet. Expand
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency
TLDR
It is demonstrated that models trained using the described mixed-size training regime are more resilient to image size changes and generalize well even on small images, which allows faster inference by using smaller images attest time. Expand
Compensating for the Lack of Extra Training Data by Learning Extra Representation
TLDR
A novel framework, Extra Representation (ExRep), is introduced, to surmount the problem of not having access to the JFT-300M data by instead using ImageNet and the publicly available model that has been pre-trained on JFT -300M. Expand
MIXSIZE: TRAINING CONVNETS WITH MIXED IMAGE SIZES
  • 2020
Convolutional neural networks (CNNs) are commonly trained using a fixed spatial image size predetermined for a given model. Although trained on images of a specific size, it is well established thatExpand
Scale Calibrated Training: Improving Generalization of Deep Networks via Scale-Specific Normalization
TLDR
A novel normalization scheme called Scale-Specific Batch Normalization is equipped to SCT in replacement of batch normalization, which improves accuracy of single Resnet-50 on ImageNet by 1.7% and 11.5% accuracy when testing on image sizes of 224 and 128 respectively. Expand
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
TLDR
This study trains ViT models of various sizes on the public ImageNet-21k dataset which either match or outperform their counterparts trained on the larger, but not publicly available JFT-300M dataset. Expand
Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet
TLDR
By slightly tuning the structure of vision transformers and introducing token labeling—a new training objective, these models are able to achieve better results than the CNN counterparts and other transformer-based classification models with similar amount of training parameters and computations. Expand
Training data-efficient image transformers & distillation through attention
TLDR
This work produces a competitive convolution-free transformer by training on Imagenet only, and introduces a teacher-student strategy specific to transformers that relies on a distillation token ensuring that the student learns from the teacher through attention. Expand
Learning to Learn Parameterized Classification Networks for Scalable Input Images
TLDR
This work employs meta learners to generate convolutional weights of main networks for various input scales and maintain privatized Batch Normalization layers per scale for improved training performance and achieves an improved accuracy-efficiency trade-off during the adaptive inference process. Expand
Big Transfer (BiT): General Visual Representation Learning
TLDR
By combining a few carefully selected components, and transferring using a simple heuristic, Big Transfer achieves strong performance on over 20 datasets and performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 54 REFERENCES
Do Better ImageNet Models Transfer Better?
TLDR
It is found that, when networks are used as fixed feature extractors or fine-tuned, there is a strong correlation between ImageNet accuracy and transfer accuracy, and ImageNet features are less general than previously suggested. Expand
AutoAugment: Learning Augmentation Policies from Data
TLDR
This paper describes a simple procedure called AutoAugment to automatically search for improved data augmentation policies, which achieves state-of-the-art accuracy on CIFAR-10, CIFar-100, SVHN, and ImageNet (without additional data). Expand
CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features
TLDR
Patches are cut and pasted among training images where the ground truth labels are also mixed proportionally to the area of the patches, and CutMix consistently outperforms state-of-the-art augmentation strategies on CIFAR and ImageNet classification tasks, as well as on ImageNet weakly-supervised localization task. Expand
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. Expand
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective. Expand
Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks
TLDR
This work designs a method to reuse layers trained on the ImageNet dataset to compute mid-level image representation for images in the PASCAL VOC dataset, and shows that despite differences in image statistics and tasks in the two datasets, the transferred representation leads to significantly improved results for object and action classification. Expand
Bag of Tricks for Image Classification with Convolutional Neural Networks
TLDR
This paper examines a collection of training procedure refinements and empirically evaluates their impact on the final model accuracy through ablation study, and shows that by combining these refinements together, they are able to improve various CNN models significantly. Expand
MultiGrain: a unified image embedding for classes and instances
TLDR
A key component of MultiGrain is a pooling layer that takes advantage of high-resolution images with a network trained at a lower resolution that provides state-of-the-art classification accuracy when fed to a linear classifier. Expand
Exploring the Limits of Weakly Supervised Pretraining
TLDR
This paper presents a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images and shows improvements on several image classification and object detection tasks, and reports the highest ImageNet-1k single-crop, top-1 accuracy to date. Expand
End-to-End Learning of Deep Visual Representations for Image Retrieval
TLDR
This article uses a large-scale but noisy landmark dataset and develops an automatic cleaning method that produces a suitable training set for deep retrieval, and builds on the recent R-MAC descriptor, which can be interpreted as a deep and differentiable architecture, and presents improvements to enhance it. Expand
...
1
2
3
4
5
...