• Corpus ID: 231861820

Locally Free Weight Sharing for Network Width Search

  title={Locally Free Weight Sharing for Network Width Search},
  author={Xiu Su and Shan You and Tao Huang and Fei Wang and Chen Qian and Changshui Zhang and Chang Xu},
  • Xiu SuShan You Chang Xu
  • Published 10 February 2021
  • Computer Science
  • ArXiv
Searching for network width is an effective way to slim deep neural networks with hardware budgets. With this aim, a one-shot supernet is usually leveraged as a performance evaluator to rank the performance w.r.t. different width. Nevertheless, current methods mainly follow a manually fixed weight sharing pattern, which is limited to distinguish the performance gap of different width. In this paper, to better evaluate each width, we propose a loCAlly FrEe weight sharing strategy (CafeNet… 

Prioritized Architecture Sampling with Monto-Carlo Tree Search

  • Xiu SuTao Huang Chang Xu
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
This paper introduces a sampling strategy based on Monte Carlo tree search (MCTS) with the search space modeled as a Monte Carlo trees (M CT), which captures the dependency among layers and significantly improves search efficiency and performance.

Manifold Regularized Dynamic Network Pruning

  • Yehui TangYunhe Wang Chang Xu
  • Computer Science
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
A new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks (dubbed as ManiDP) is proposed, which shows better performance in terms of both accuracy and computational cost compared to the state-of-the-art methods.

CHEX: CHannel EXploration for CNN Model Compression

This paper proposes to repeatedly prune and regrow the channels throughout the training process, which reduces the risk of pruning important channels prematurely, and can effectively reduce the FLOPs of diverse CNN architectures on a variety of computer vision tasks.

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

A novel sub-network search and fine-tuning method that is named Ensemble Knowledge Guidance (EKG), which experimentally proves that the fluctuation of the loss landscape is an effective metric to evaluate the potential performance.

Sufficient Vision Transformer

In this paper, the Sufficiency-Blocks (S-Blocks) are proposed to be applied across the depth of Suf-ViT to disentangle and discard task-irrelevant information accurately and a Sufficient-Reduction Loss (SRLoss) is formulated leveraging the concept of Mutual Information (MI) that enablesSuf- ViT to extract more reliable sufficient representations by removing task-IRrelevant information.

Trainability Preserving Neural Structured Pruning

Trainability preserving pruning is presented, a regularization-based structured pruning method that can maintain trainability during pruning for the large-scale deep neural networks and can compete with the ground-truth dynamical isometry recovery method on linear MLP networks.

ScaleNet: Searching for the Model to Scale

Experimental results show that the searched architectures by the proposed ScaleNet with various FLOPs budgets can outperform the referred methods on various datasets, including ImageNet-1k and fine-tuning tasks.

Multi-scale alignment and Spatial ROI Module for COVID-19 Diagnosis

A deep spatial pyramid pooling (D-SPP) module is proposed to integrate contextual information over different resolutions, aiming to extract information under different scales of COVID-19 images effectively, and a CO VID-19 infection detection (CID) module to draw attention to the lesion area and remove interference from irrelevant information.

DyRep: Bootstrapping Training with Dynamic Re-parameterization

  • Tao HuangShan You Chang Xu
  • Computer Science
    2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2022
A dynamic re-parameterization (DyRep) method, which encodes Rep technique into the training process that dynamically evolves the network structures, and applies Rep to enhance their representational capacity to suppress the noisy and redundant operations introduced by Rep.

On Redundancy and Diversity in Cell-based Neural Architecture Search

An empirical post-hoc analysis of architectures from the popular cellbased search spaces finds that the existing search spaces contain a high degree of redundancy: the architecture performance is minimally sensitive to changes at large parts of the cells, and universally adopted designs significantly increase the complexities but have very limited impact on the performance.



AutoSlim: Towards One-Shot Architecture Search for Channel Numbers

A simple and one-shot solution to set channel numbers in a neural network to achieve better accuracy under constrained resources (e.g., FLOPs, latency, memory footprint or model size) is presented.

GreedyNAS: Towards Fast One-Shot NAS With Greedy Supernet

This paper proposes a multi-path sampling strategy with rejection, and greedily filter the weak paths to ease the burden of supernet by encouraging it to focus more on evaluation of those potentially-good ones, which are identified using a surrogate portion of validation data.

Data Agnostic Filter Gating For Efficient Deep Networks

  • Xiu SuShan You Chang Xu
  • Computer Science
    ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2022
This paper proposes a data-agnostic filter pruning method that uses an auxiliary network named Dagger module to induce pruning with the pre-trained weights as input and utilizes an explicit FLOPs-aware regularisation mechanism to directly promote pruning filters toward the targetFLOPs.

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

A new scaling method is proposed that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient and is demonstrated the effectiveness of this method on scaling up MobileNets and ResNet.

Network Pruning via Transformable Architecture Search

Neural architecture search is proposed to apply neural architecture search to search directly for a network with flexible channel and layer sizes to break the structure limitation of the pruned networks.

Rethinking the Value of Network Pruning

It is found that with optimal learning rate, the "winning ticket" initialization as used in Frankle & Carbin (2019) does not bring improvement over random initialization, and the need for more careful baseline evaluations in future research on structured pruning methods is suggested.

Approximated Oracle Filter Pruning for Destructive CNN Width Optimization

Approximated Oracle Filter Pruning (AOFP) is proposed, which keeps searching for the least important filters in a binary search manner, makes pruning attempts by masking out filters randomly, accumulates the resulting errors, and finetunes the model via a multi-path framework.

Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search

This paper introduces EnTranNAS that is composed of Engine-cells and Transit-cells, and develops an architecture derivation method to replace the traditional one that is based on a hand-crafted rule, which enables differentiable sparsification, and keeps the derived architecture equivalent to that of engines, which further improves the consistency between search and evaluation.

Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio

A new framework named network adjustment is presented, which considers network accuracy as a function ofFLOPs, so that under each network configuration, one can estimate the FLOPs utilization ratio (FUR) for each layer and use it to determine whether to increase or decrease the number of channels on the layer.

Multi-Scale Dense Networks for Resource Efficient Image Classification

Experiments demonstrate that the proposed framework substantially improves the existing state-of-the-art in both image classification with computational resource limits at test time and budgeted batch classification.