• Corpus ID: 238252949

ResNet strikes back: An improved training procedure in timm

  title={ResNet strikes back: An improved training procedure in timm},
  author={Ross Wightman and Hugo Touvron and Herv'e J'egou},
The influential Residual Networks designed by He et al. remain the gold-standard architecture in numerous scientific publications. They typically serve as the default architecture in studies, or as baselines when new architectures are proposed. Yet there has been significant progress on best practices for training neural networks since the inception of the ResNet architecture in 2015. Novel optimization & data-augmentation have increased the effectiveness of the training recipes. In this paper, we… 

Improving the Generalization of Supervised Models

This paper enrichs the common supervised training framework using two key components of recent SSL models: multi-scale crops for data augmentation and the use of an expendable projector head, and replaces the last layer of class weights with class prototypes computed on the fly using a memory bank.

Heed the noise in performance evaluations in neural architecture search

This work proposes to reduce noise in architecture evaluations by evaluating architectures based on average performance over multiple network training runs using different random seeds and cross-validation, and shows that reducing noise in Architecture evaluations enables finding better architectures by all considered search algorithms.

Receptive Field Refinement for Convolutional Neural Networks Reliably Improves Predictive Performance

This work presents a new approach to receptive field analysis that can yield theoretical and empirical performance gains across twenty well-known CNN architectures examined in the authors' experiments, and increases parameter efficiency across past and current top-performing CNN-architectures.

DeiT III: Revenge of the ViT

This paper revisits the supervised training of ViTs and builds upon and simplifies a recipe introduced for training ResNet-50, and includes a new simple data-augmentation procedure with only 3 augmentations, closer to the practice in self-supervised learning.

Co-training 2L Submodels for Visual Recognition

It is shown that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation and is compatible with multiple architectures, including RegNet, ViT, PiT, XCiT, Swin and ConvNext.

Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

R E LICv2 is the first unsupervised representation learning method to consistently outperform a standard supervised baseline in a like-for-like comparison across a wide range of ResNet architectures and is comparable to state-of-the-art self-supervised vision transformers.

Back-to-Bones: Rediscovering the Role of Backbones in Domain Generalization

Back to backbones is started proposing a comprehensive analysis of their intrinsic generalization capabilities, so far ignored by the research community, and shows that by adopting competitive backbones in conjunction with effective data augmentation, plain ERM outperforms recent DG solutions and achieves state-of-the-art accuracy.

A Light Recipe to Train Robust Vision Transformers

This paper shows that ViTs are highly suitable for adversarial training to achieve competitive performance and recommends that the community should avoid translating the canonical training recipes in ViTs to robust training and rethink common training choices in the context of adversarialTraining.

Revisiting Batch Normalization

This work revisits the BN formulation and present a new initialization method and update approach for BN to address the aforementioned issues and presents a new online BN-based input data normalization technique to alleviate the need for other offline or fixed methods.

Disentangling Architecture and Training for Optical Flow

PWC-Net, IRR-PWC and RAFT are revisit three prominent architectures with a common set of modern training techniques and datasets, and observe significant performance gains, demonstrating the importance and generality of these training details.



Revisiting ResNets: Improved Training and Scaling Strategies

It is found that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models.

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

This paper empirically show that on the ImageNet dataset large minibatches cause optimization difficulties, but when these are addressed the trained networks exhibit good generalization and enable training visual recognition models on internet-scale data with high efficiency.

Aggregated Residual Transformations for Deep Neural Networks

On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.

Rethinking the Inception Architecture for Computer Vision

This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.

AutoAugment: Learning Augmentation Policies from Data

This paper describes a simple procedure called AutoAugment to automatically search for improved data augmentation policies, which achieves state-of-the-art accuracy on CIFAR-10, CIFar-100, SVHN, and ImageNet (without additional data).

High-Performance Large-Scale Image Recognition Without Normalization

An adaptive gradient clipping technique is developed which overcomes instabilities in batch normalization, and a significantly improved class of Normalizer-Free ResNets is designed which attain significantly better performance when finetuning on ImageNet.

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

The empirical results demonstrate the superior performance of LAMB across various tasks such as BERT and ResNet-50 training with very little hyperparameter tuning, and the optimizer enables use of very large batch sizes of 32868 without any degradation of performance.

EfficientNetV2: Smaller Models and Faster Training

An improved method of progressive learning, which adaptively adjusts regularization (e.g., dropout and data augmentation) along with image size is proposed, which significantly outperforms previous models on ImageNet and CIFAR/Cars/Flowers datasets.

Slowing Down the Weight Norm Increase in Momentum-based Optimizers

The modified optimizers SGDP and AdamP successfully regularize the norm growth and improve the performance of a broad set of models, including image classification and retrieval, object detection, robustness benchmarks, and audio classification.

Fixing the train-test resolution discrepancy

It is experimentally validated that, for a target test resolution, using a lower train resolution offers better classification at test time, and a simple yet effective and efficient strategy to optimize the classifier performance when the train and test resolutions differ is proposed.