An Investigation Into the Stochasticity of Batch Whitening

@article{Huang2020AnII,
  title={An Investigation Into the Stochasticity of Batch Whitening},
  author={Lei Huang and Lei Zhao and Yi Zhou and F. Zhu and Li Liu and L. Shao},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020},
  pages={6438-6447}
}
  • Lei Huang, Lei Zhao, +3 authors L. Shao
  • Published 2020
  • Computer Science
  • 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Batch Normalization (BN) is extensively employed in various network architectures by performing standardization within mini-batches. A full understanding of the process has been a central target in the deep learning communities. Unlike existing works, which usually only analyze the standardization operation, this paper investigates the more general Batch Whitening (BW). Our work originates from the observation that while various whitening transformations equivalently improve the conditioning… Expand
Improving Generalization of Batch Whitening by Convolutional Unit Optimization
Batch Whitening is a technique that accelerates and stabilizes training by transforming input features to have a zero mean (Centering) and a unit variance (Scaling), and by removing linearExpand
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
TLDR
A unified picture of the main motivation behind different approaches from the perspective of optimization is provided, and a taxonomy for understanding the similarities and differences between them is presented. Expand
Robust Differentiable SVD
  • Wei Wang, Zheng Dang, Yinlin Hu, P. Fua, M. Salzmann
  • Computer Science, Medicine
  • IEEE transactions on pattern analysis and machine intelligence
  • 2021
TLDR
It is shown that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying in practice on an iterative process and thus yields more accurate gradients, which results in increased accuracy for image classification and style transfer. Expand
Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance Pooling?
  • Yue Song, N. Sebe, Wei Wang
  • Computer Science
  • ArXiv
  • 2021
TLDR
A hybrid training protocol is proposed for SVD-based GCP meta-layers such that competitive performances can be achieved against NewtonSchulz iteration and a new GCP metalayer is proposed that uses SVD in the forward pass, and Padé approximants in the backward propagation to compute the gradients. Expand
Group Whitening: Balancing Learning Efficiency and Representational Capacity
TLDR
This paper proposes group whitening (GW), which elaborately exploits the advantages of the whitening operation and avoids the disadvantages of normalization within mini-batches. Expand

References

SHOWING 1-10 OF 52 REFERENCES
Whitening and Coloring Batch Transform for GANs
TLDR
It is shown that the authors' conditional Coloring can represent categorical conditioning information which largely helps the cGAN qualitative results and that full-feature whitening is important in a general GAN scenario in which the training process is known to be highly unstable. Expand
Iterative Normalization: Beyond Standardization Towards Efficient Whitening
TLDR
This work proposes Iterative Normalization (IterNorm), which employs Newton’s iterations for much more efficient whitening, while simultaneously avoiding the eigen-decomposition, and exclusively introduces Stochastic Normalization Disturbance (SND), which measures the inherent stochastic uncertainty of samples when applied to normalization operations. Expand
Decorrelated Batch Normalization
TLDR
This work proposes Decorrelated Batch Normalization (DBN), which not just centers and scales activations but whitens them, and shows that DBN can improve the performance of BN on multilayer perceptrons and convolutional neural networks. Expand
A Large-Scale Study on Regularization and Normalization in GANs
TLDR
This work takes a sober view of the current state of GANs from a practical perspective, discusses and evaluates common pitfalls and reproducibility issues, and open-source the code on Github and provide pre-trained models on TensorFlow Hub. Expand
EvalNorm: Estimating Batch Normalization Statistics for Evaluation
TLDR
EvalNorm is proposed to address the issue of weak batch normalization when training with small minibatches by estimating corrected normalization statistics to use for BN during evaluation, and yields large gains for models trained with smaller batches. Expand
Understanding Batch Normalization
TLDR
It is shown that BN primarily enables training with larger learning rates, which is the cause for faster convergence and better generalization, and contrasts the results against recent findings in random matrix theory, shedding new light on classical initialization schemes and their consequences. Expand
Improved Techniques for Training GANs
TLDR
This work focuses on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic, and presents ImageNet samples with unprecedented resolution and shows that the methods enable the model to learn recognizable features of ImageNet classes. Expand
Group Normalization
TLDR
Group Normalization can outperform its BN-based counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. Expand
Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization
TLDR
It is argued that this acceleration is due to the fact that Batch Normalization splits the optimization task into optimizing length and direction of the parameters separately, which allows gradient-based methods to leverage a favourable global structure in the loss landscape. Expand
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
TLDR
This work proposes a two time-scale update rule (TTUR) for training GANs with stochastic gradient descent on arbitrary GAN loss functions and introduces the "Frechet Inception Distance" (FID) which captures the similarity of generated images to real ones better than the Inception Score. Expand
...
1
2
3
4
5
...