• Corpus ID: 211677676

Channel Equilibrium Networks for Learning Deep Representation

@inproceedings{Shao2020ChannelEN,
  title={Channel Equilibrium Networks for Learning Deep Representation},
  author={Wenqi Shao and Shitao Tang and Xingang Pan and Ping Liang Tan and Xiaogang Wang and Ping Luo},
  booktitle={ICML},
  year={2020}
}
Convolutional Neural Networks (CNNs) are typically constructed by stacking multiple building blocks, each of which contains a normalization layer such as batch normalization (BN) and a rectified linear function such as ReLU. However, this work shows that the combination of normalization and rectified linear function leads to inhibited channels, which have small magnitude and contribute little to the learned feature representation, impeding the generalization ability of CNNs. Unlike prior arts… 
Exemplar Normalization for Learning Deep Representation
TLDR
This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network.
Deep Multimodal Fusion by Channel Exchanging
TLDR
Channel-Exchanging-Network is proposed, a parameter-free multimodal fusion framework that dynamically exchanges channels between sub-networks of different modalities that is self-guided by individual channel importance that is measured by the magnitude of Batch-Normalization (BN) scaling factor during training.
Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction
TLDR
Channel-Exchanging-Network (CEN) is proposed which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning and dynamically exchanges channels between subnetworks of different modalities.
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
TLDR
A unified picture of the main motivation behind different approaches from the perspective of optimization is provided, and a taxonomy for understanding the similarities and differences between them is presented.
Group Whitening: Balancing Learning Efficiency and Representational Capacity
  • Lei Huang, Li Liu, F. Zhu, L. Shao
  • Computer Science, Mathematics
    2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2021
TLDR
This paper proposes group whitening (GW), which exploits the advantages of the whitening operation and avoids the disadvantages of normalization within mini-batches, and analyses the constraints imposed on features by normalization, and shows how the batch size affects the performance of batch (group) normalized networks, from the perspective of model’s representational capacity.

References

SHOWING 1-10 OF 49 REFERENCES
Exemplar Normalization for Learning Deep Representation
TLDR
This work investigates a novel dynamic learning-to-normalize (L2N) problem by proposing Exemplar Normalization (EN), which is able to learn different normalization methods for different convolutional layers and image samples of a deep network.
Squeeze-and-Excitation Networks
TLDR
This work proposes a novel architectural unit, which is term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and shows that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets.
SSN: Learning Sparse Switchable Normalization via SparsestMax
TLDR
Sparse switchable normalization (SSN) is presented where the importance ratios are constrained to be sparse, and SparsestMax is a sparse version of softmax, which is guaranteed to select only one normalizer for each normalization layer, avoiding redundant computations and improving interpretability of normalizer selection.
On the importance of single directions for generalization
TLDR
It is found that class selectivity is a poor predictor of task importance, suggesting not only that networks which generalize well minimize their dependence on individual units by reducing their selectivity, but also that individually selective units may not be necessary for strong network performance.
Decorrelated Batch Normalization
TLDR
This work proposes Decorrelated Batch Normalization (DBN), which not just centers and scales activations but whitens them, and shows that DBN can improve the performance of BN on multilayer perceptrons and convolutional neural networks.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
TLDR
Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Group Normalization
TLDR
Group Normalization can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks.
Channel Pruning for Accelerating Very Deep Neural Networks
  • Yihui He, X. Zhang, Jian Sun
  • Computer Science
    2017 IEEE International Conference on Computer Vision (ICCV)
  • 2017
TLDR
This paper proposes an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction, and generalizes this algorithm to multi-layer and multi-branch cases.
Understanding deep learning requires rethinking generalization
TLDR
These experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data, and confirm that simple depth two neural networks already have perfect finite sample expressivity.
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
TLDR
The "exponential linear unit" (ELU) which speeds up learning in deep neural networks and leads to higher classification accuracies and significantly better generalization performance than ReLUs and LReLUs on networks with more than 5 layers.
...
1
2
3
4
5
...