DyRep: Bootstrapping Training with Dynamic Re-parameterization

@article{Huang2022DyRepBT,
  title={DyRep: Bootstrapping Training with Dynamic Re-parameterization},
  author={Tao Huang and Shan You and Bohan Zhang and Yuxuan Du and Fei Wang and Chen Qian and Chang Xu},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022},
  pages={578-587}
}
  • Tao HuangShan You Chang Xu
  • Published 24 March 2022
  • Computer Science
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Structural re-parameterization (Rep) methods achieve noticeable improvements on simple VGG-style networks. Despite the prevalence, current Rep methods simply re-parameterize all operations into an augmented network, including those that rarely contribute to the model's performance. As such, the price to pay is an expensive computational overhead to manipulate these unnecessary behaviors. To eliminate the above caveats, we aim to boot-strap the training with minimal cost by devising a dynamic re… 

Efficient Re-parameterization Operations Search for Easy-to-Deploy Network Based on Directional Evolutionary Strategy

An improved re- parameterization search space is designed, which including more type of re-parameterization operations, which can improve the performance of convolutional networks and visualize the output features of the architecture to analyze the reasons for the formation of the re- Parameterization architecture.

De-IReps: Searching for improved Re-parameterizing Architecture based on Differentiable Evolution Strategy

This work designs a search space that covers almost all re-parameterization operations and proposes a differentiable evolutionary strategy (DES) to explore the reparameterization search space.

Knowledge Distillation from A Stronger Teacher

This paper shows that simply preserving the relations between the predictions of teacher and student would sufficed, and proposes a correlation-based loss to capture the intrinsic inter-class relations from the teacher explicitly, and extends this relational match to the intra-class level.

ViTAS: Vision Transformer Architecture Search

This paper develops a new cyclic weight-sharing mechanism for token embeddings of the ViTs, which enables each channel could more evenly contribute to all candidate architectures and proposes identity shifting to alleviate the many-to-one issue in superformer.

LightViT: Towards Light-Weight Convolution-Free Vision Transformers

This paper introduces a global yet efficient aggregation scheme into both self-attention and feed-forward network of ViTs, where additional learnable tokens are introduced to capture global dependencies; and bi-dimensional channel and spatial attentions are imposed over token embeddings.

References

SHOWING 1-10 OF 37 REFERENCES

RepNAS: Searching for Efficient Re-parameterizing Blocks

RepNAS, a one-stage NAS approach, is present to efficiently search the optimal diverse branch block (ODBB) for each layer under the branch number constraint, and experimental results show the searched ODBB can easily surpass the manual diverse branches block (DBB) with ef ficient training.

SNIP: Single-shot Network Pruning based on Connection Sensitivity

This work presents a new approach that prunes a given network once at initialization prior to training, and introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task.

Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers

This paper proposes a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task.

RepVGG: Making VGG-style ConvNets Great Again

We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3 × 3 convolution and ReLU, while the

Picking Winning Tickets Before Training by Preserving Gradient Flow

This work argues that efficient training requires preserving the gradient flow through the network, and proposes a simple but effective pruning criterion called Gradient Signal Preservation (GraSP), which achieves significantly better performance than the baseline at extreme sparsity levels.

Aggregated Residual Transformations for Deep Neural Networks

On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.

Data-Driven Sparse Structure Selection for Deep Neural Networks

A simple and effective framework to learn and prune deep models in an end-to-end manner by adding sparsity regularizations on factors, and solving the optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method.

Towards Adaptive Residual Network Training: A Neural-ODE Perspective

An adaptive training algorithm for residual networks, LipGrow, which automatically increases network depth thus accelerates training and proposes a novel performance measure specific to the depth increase.

MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks

MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers, which is scalable to large networks, adaptable to specific resource constraints, and capable of increasing the network's performance.

GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet

This paper proposes a multi-path sampling strategy with rejection, and greedily filter the weak paths to ease the burden of supernet by encouraging it to focus more on evaluation of those potentially-good ones, which are identified using a surrogate portion of validation data.