DyRep: Bootstrapping Training with Dynamic Re-parameterization
@article{Huang2022DyRepBT, title={DyRep: Bootstrapping Training with Dynamic Re-parameterization}, author={Tao Huang and Shan You and Bohan Zhang and Yuxuan Du and Fei Wang and Chen Qian and Chang Xu}, journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2022}, pages={578-587} }
Structural re-parameterization (Rep) methods achieve noticeable improvements on simple VGG-style networks. Despite the prevalence, current Rep methods simply re-parameterize all operations into an augmented network, including those that rarely contribute to the model's performance. As such, the price to pay is an expensive computational overhead to manipulate these unnecessary behaviors. To eliminate the above caveats, we aim to boot-strap the training with minimal cost by devising a dynamic re…
Figures and Tables from this paper
5 Citations
Efficient Re-parameterization Operations Search for Easy-to-Deploy Network Based on Directional Evolutionary Strategy
- Computer Science
- 2022
An improved re- parameterization search space is designed, which including more type of re-parameterization operations, which can improve the performance of convolutional networks and visualize the output features of the architecture to analyze the reasons for the formation of the re- Parameterization architecture.
De-IReps: Searching for improved Re-parameterizing Architecture based on Differentiable Evolution Strategy
- Computer ScienceArXiv
- 2022
This work designs a search space that covers almost all re-parameterization operations and proposes a differentiable evolutionary strategy (DES) to explore the reparameterization search space.
Knowledge Distillation from A Stronger Teacher
- Computer ScienceArXiv
- 2022
This paper shows that simply preserving the relations between the predictions of teacher and student would sufficed, and proposes a correlation-based loss to capture the intrinsic inter-class relations from the teacher explicitly, and extends this relational match to the intra-class level.
ViTAS: Vision Transformer Architecture Search
- Computer ScienceECCV
- 2022
This paper develops a new cyclic weight-sharing mechanism for token embeddings of the ViTs, which enables each channel could more evenly contribute to all candidate architectures and proposes identity shifting to alleviate the many-to-one issue in superformer.
LightViT: Towards Light-Weight Convolution-Free Vision Transformers
- Computer ScienceArXiv
- 2022
This paper introduces a global yet efficient aggregation scheme into both self-attention and feed-forward network of ViTs, where additional learnable tokens are introduced to capture global dependencies; and bi-dimensional channel and spatial attentions are imposed over token embeddings.
References
SHOWING 1-10 OF 37 REFERENCES
RepNAS: Searching for Efficient Re-parameterizing Blocks
- Computer ScienceArXiv
- 2021
RepNAS, a one-stage NAS approach, is present to efficiently search the optimal diverse branch block (ODBB) for each layer under the branch number constraint, and experimental results show the searched ODBB can easily surpass the manual diverse branches block (DBB) with ef ficient training.
SNIP: Single-shot Network Pruning based on Connection Sensitivity
- Computer ScienceICLR
- 2019
This work presents a new approach that prunes a given network once at initialization prior to training, and introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task.
Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
- Computer ScienceICLR
- 2018
This paper proposes a channel pruning technique for accelerating the computations of deep convolutional neural networks (CNNs) that focuses on direct simplification of the channel-to-channel computation graph of a CNN without the need of performing a computationally difficult and not-always-useful task.
RepVGG: Making VGG-style ConvNets Great Again
- Computer Science2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2021
We present a simple but powerful architecture of convolutional neural network, which has a VGG-like inference-time body composed of nothing but a stack of 3 × 3 convolution and ReLU, while the…
Picking Winning Tickets Before Training by Preserving Gradient Flow
- Computer ScienceICLR
- 2020
This work argues that efficient training requires preserving the gradient flow through the network, and proposes a simple but effective pruning criterion called Gradient Signal Preservation (GraSP), which achieves significantly better performance than the baseline at extreme sparsity levels.
Aggregated Residual Transformations for Deep Neural Networks
- Computer Science2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2017
On the ImageNet-1K dataset, it is empirically show that even under the restricted condition of maintaining complexity, increasing cardinality is able to improve classification accuracy and is more effective than going deeper or wider when the authors increase the capacity.
Data-Driven Sparse Structure Selection for Deep Neural Networks
- Computer ScienceECCV
- 2018
A simple and effective framework to learn and prune deep models in an end-to-end manner by adding sparsity regularizations on factors, and solving the optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method.
Towards Adaptive Residual Network Training: A Neural-ODE Perspective
- Computer ScienceICML
- 2020
An adaptive training algorithm for residual networks, LipGrow, which automatically increases network depth thus accelerates training and proposes a novel performance measure specific to the depth increase.
MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep Networks
- Computer Science2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
- 2018
MorphNet iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers, which is scalable to large networks, adaptable to specific resource constraints, and capable of increasing the network's performance.
GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet
- Computer Science2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
- 2020
This paper proposes a multi-path sampling strategy with rejection, and greedily filter the weak paths to ease the burden of supernet by encouraging it to focus more on evaluation of those potentially-good ones, which are identified using a surrogate portion of validation data.