• Corpus ID: 212747706

Fixing the train-test resolution discrepancy: FixEfficientNet

@article{Touvron2020FixingTT,
  title={Fixing the train-test resolution discrepancy: FixEfficientNet},
  author={Hugo Touvron and Andrea Vedaldi and Matthijs Douze and Herv'e J'egou},
  journal={ArXiv},
  year={2020},
  volume={abs/2003.08237}
}
This note complements the paper "Fixing the train-test resolution discrepancy" that introduced the FixRes method. First, we show that this strategy is advantageously combined with recent training recipes from the literature. Most importantly, we provide new results for the EfficientNet architecture. The resulting network, called FixEfficientNet, significantly outperforms the initial architecture with the same number of parameters. For instance, our FixEfficientNet-B0 trained without additional… 

Figures and Tables from this paper

Fixing the train-test resolution discrepancy
TLDR
It is experimentally validated that, for a target test resolution, using a lower train resolution offers better classification at test time, and a simple yet effective and efficient strategy to optimize the classifier performance when the train and test resolutions differ is proposed.
Training data-efficient image transformers & distillation through attention
TLDR
This work produces a competitive convolution-free transformer by training on Imagenet only, and introduces a teacher-student strategy specific to transformers that relies on a distillation token ensuring that the student learns from the teacher through attention.
Self-supervised Pretraining of Visual Features in the Wild
TLDR
The final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self- Supervised learning works in a real world setting.
Convolution Filter Pruning for Transfer Learning on Small Dataset
TLDR
A scheme to reduce the size of a pre-trained full-scale model with a domain-specific dataset that combines model compression and transfer learning and can correct structure and parameters to prune for a target dataset, which makes the following transfer learning more efficient.
Soft Contrastive Learning for Visual Localization
TLDR
This paper argues that any artificial division based on proximity measure is undesirable, due to the inherently ambiguous supervision for images near proximity threshold, and proposes a novel technique that uses soft positive/negative assignments of images for contrastive learning, avoiding the aforementioned problem.
Going deeper with Image Transformers
TLDR
This work builds and optimize deeper transformer networks for image classification and investigates the interplay of architecture and optimization of such dedicated transformers, making two architecture changes that significantly improve the accuracy of deep transformers.
Data Augmentation via Structured Adversarial Perturbations
TLDR
This work proposes a method to generate adversarial examples that maintain some desired natural structure and demonstrates this approach through two types of image transformations: photometric and geometric.
Representation Learning with Video Deep InfoMax
TLDR
This paper finds that drawing views from both natural-rate sequences and temporally-downsampled sequences yields results on Kinetics-pretrained action recognition tasks which match or outperform prior state-of-the-art methods that use more costly large-time-scale transformer models.
Self-supervised Neural Architecture Search
TLDR
It is shown that a self-supervised neural architecture search that allows finding novel network models without the need for labeled data leads to comparable results to supervised training with a "fully labeled" NAS and that it can improve the performance of self- supervised learning.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
TLDR
Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 18 REFERENCES
Fixing the train-test resolution discrepancy
TLDR
It is experimentally validated that, for a target test resolution, using a lower train resolution offers better classification at test time, and a simple yet effective and efficient strategy to optimize the classifier performance when the train and test resolutions differ is proposed.
Adversarial Examples Improve Image Recognition
TLDR
This work proposes AdvProp, an enhanced adversarial training scheme which treats adversarial examples as additional examples, to prevent overfitting, and shows that AdvProp improves a wide range of models on various image recognition tasks and performs better when the models are bigger.
AutoAugment: Learning Augmentation Policies from Data
TLDR
This paper describes a simple procedure called AutoAugment to automatically search for improved data augmentation policies, which achieves state-of-the-art accuracy on CIFAR-10, CIFar-100, SVHN, and ImageNet (without additional data).
Randaugment: Practical automated data augmentation with a reduced search space
TLDR
This work proposes a simplified search space that vastly reduces the computational expense of automated augmentation, and permits the removal of a separate proxy task.
Exploring the Limits of Weakly Supervised Pretraining
TLDR
This paper presents a unique study of transfer learning with large convolutional networks trained to predict hashtags on billions of social media images and shows improvements on several image classification and object detection tasks, and reports the highest ImageNet-1k single-crop, top-1 accuracy to date.
Learning Transferable Architectures for Scalable Image Recognition
TLDR
This paper proposes to search for an architectural building block on a small dataset and then transfer the block to a larger dataset and introduces a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models.
Do ImageNet Classifiers Generalize to ImageNet?
TLDR
The results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
TLDR
A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
Are we done with ImageNet?
TLDR
A significantly more robust procedure for collecting human annotations of the ImageNet validation set is developed, which finds the original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end.
...
1
2
...