• Corpus ID: 232092883

Learning with Hyperspherical Uniformity

@article{Liu2021LearningWH,
  title={Learning with Hyperspherical Uniformity},
  author={Weiyang Liu and Rongmei Lin and Zhen Liu and Li Xiong and Bernhard Scholkopf and Adrian Weller},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.01649}
}
Due to the over-parameterization nature, neural networks are a powerful tool for nonlinear function approximation. In order to achieve good generalization on unseen data, a suitable inductive bias is of great importance for neural networks. One of the most straightforward ways is to regularize the neural network with some additional objectives. `2 regularization serves as a standard regularization for neural networks. Despite its popularity, it essentially regularizes one dimension of the… 

Figures and Tables from this paper

Orthogonal Over-Parameterized Training
TLDR
A novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere is proposed and reveals that learning a proper coordinate system for neurons is crucial to generalization.
SphereFace Revived: Unifying Hyperspherical Face Recognition
TLDR
This paper introduces a unified framework to understand large angular margin in hyperspherical face recognition, and extends the study of SphereFace and proposes an improved variant with substantially better training stability -- SphereFace-R.
Iterative Teaching by Label Synthesis
TLDR
This paper proposes a label synthesis teaching framework where the teacher randomly selects input teaching examples and then synthesizes suitable outputs for them and shows that this framework can avoid costly example selection while still provably achieving exponential teachability.
The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy
TLDR
This paper systematically studies the ubiquitous existence of redundancy in ViTs at all three levels: patch embedding, attention map, and weight space, and advocates a principle of diversity for training ViTs, by presenting corresponding regularizers that encourage the representation diversity and coverage at each of those levels, that enabling capturing more discriminative information.
Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee
TLDR
A new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis, and establishes a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal.
Maximum Class Separation as Inductive Bias in One Matrix
TLDR
The main observation behind the approach is that separation does not require optimization but can be solved in closed-form prior to training and plugged into a network, and it finds empirically that maximum separation works best as a fixed bias; making the matrix learnable adds nothing to the performance.
Hyperspherical Consistency Regularization
TLDR
This work proposes hyperspherical consistency regularization (HCR), a simple yet effec-tive plug-and-play method, to regularize the classifier using feature-dependent information and thus avoid bias from labels.
RELAX: Representation Learning Explainability
TLDR
This work proposes the first approach for attribution-based explanations of representations by measuring similarities in the representation space between an input and masked out versions of itself, providing intuitive explanations and significantly outperforming the gradient-based baseline.
SphereFace2: Binary Classification is All You Need for Deep Face Recognition
TLDR
This paper identifies the discrepancy between training and evaluation in the existing multi-class classification framework and discusses the potential limitations caused by the “competitive” nature of softmax normalization, and proposes a novel binary classification training framework, termed SphereFace2, which effectively bridges the gap betweenTraining and evaluation.

References

SHOWING 1-10 OF 93 REFERENCES
Learning towards Minimum Hyperspherical Energy
TLDR
The redundancy regularization problem is reduced to generic energy minimization, and a minimum hyperspherical energy (MHE) objective is proposed as generic regularization for neural networks.
Orthogonal Over-Parameterized Training
TLDR
A novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere is proposed and reveals that learning a proper coordinate system for neurons is crucial to generalization.
Regularizing Neural Networks via Minimizing Hyperspherical Energy
TLDR
The compressive minimum hyperspherical energy (CoMHE) is proposed as a more effective regularization for neural networks that consistently outperforms existing regularization methods, and can be easily applied to different neural networks.
Hyperspherical Variational Auto-Encoders
TLDR
This work proposes using a von Mises-Fisher distribution instead of a Gaussian distribution for both the prior and posterior of the Variational Auto-Encoder, leading to a hyperspherical latent space.
Diverse Neural Network Learns True Target Functions
TLDR
This paper analyzes one-hidden-layer neural networks with ReLU activation, and shows that despite the non-convexity, Neural networks with diverse units have no spurious local minima and suggests a novel regularization function to promote unit diversity for potentially better generalization.
Improving the Generalization Performance of Multi-class SVM via Angular Regularization
TLDR
This paper proposes a novel angular regularizer based on the singular values of the coefficient matrix, where the uniformity of singular values reduces the correlation among different classes and drives the angles between coefficient vectors to increase.
Unitary Evolution Recurrent Neural Networks
TLDR
This work constructs an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned, and demonstrates the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.
Deep Hyperspherical Learning
TLDR
Deep hyperspherical convolution networks that are distinct from conventional inner product based convolutional networks are introduced, and it is shown that SphereNet can effectively encode discriminative representation and alleviate training difficulty, leading to easier optimization, faster convergence and comparable (even better) classification accuracy over Convolutional counterparts.
Understanding the difficulty of training deep feedforward neural networks
TLDR
The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.
Orthogonal Deep Neural Networks
TLDR
This paper proves that DNNs are of local isometry on data distributions of practical interest, and establishes a new generalization error bound that is both scale- and range-sensitive to singular value spectrum of each of networks’ weight matrices.
...
...