Do We Really Need a Learnable Classifier at the End of Deep Neural Network?

@article{Yang2022DoWR,
  title={Do We Really Need a Learnable Classifier at the End of Deep Neural Network?},
  author={Yibo Yang and Liangru Xie and Shixiang Chen and Xiangtai Li and Zhouchen Lin and Dacheng Tao},
  journal={ArXiv},
  year={2022},
  volume={abs/2203.09081}
}
Modern deep neural networks for classification usually jointly learn a backbone for representation and a linear classifier to output the logit of each class. A recent study has shown a phenomenon called neural collapse that the within-class means of features and the classifier vectors converge to the vertices of a simplex equiangular tight frame (ETF) at the terminal phase of training on a balanced dataset. Since the ETF geometric structure maximally separates the pair-wise angles of all classes… 

Figures and Tables from this paper

Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning

TLDR
This paper proposes Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients of the classifier weights and can achieve state-of-the-art performance via only one-stage training instead of 2-stage learning like nowadays SOTA works.

Neural Collapse with Normalized Features: A Geometric Analysis over the Riemannian Manifold

TLDR
This work theoretically justify the neural collapse phenomenon for normalized features, and simplifies the empirical loss function in a multi-class classification task into a nonconvex optimization problem over the Riemannian manifold by constraining all features and classi⬁ers over the sphere.

Imbalance Trouble: Revisiting Neural-Collapse Geometry

TLDR
This work adopts the unconstrained-features model (UFM) and introduces Simplex-Encoded-Labels Interpolation (SELI) as an invariant characterization of the neural collapse phenomenon, and proves for the UFM with cross-entropy loss and vanishing regularization that, irrespective of class imbalances, the embeddings and classifiers always interpolate a simplex-encoded label matrix.

Neural Collapse: A Review on Modelling Principles and Generalization

TLDR
This work analyzes the principles which aid in modelling such a phenomena from the ground up and shows how they can build a common understanding of the recently proposed models that try to explain NC.

ProxyMix: Proxy-based Mixup Training with Label Refinery for Source-Free Domain Adaptation

TLDR
This work proposes an effective method named Proxy-based Mixup training with label refinery (ProxyMix), which defines the weights of the classifier as the class prototypes and constructs a class-balanced proxy source domain by the nearest neighbors of the prototypes to bridge the unseen source domain and the target domain.

References

SHOWING 1-10 OF 47 REFERENCES

Building a Regular Decision Boundary with Deep Networks

  • Edouard Oyallon
  • Computer Science
    2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2017
TLDR
This work builds a generic architecture of Convolutional Neural Networks to discover empirical properties of neural networks and shows that the nonlinearity of a deep network does not need to be continuous, non expansive or point-wise, to achieve good performance.

Extended Unconstrained Features Model for Exploring Deep Neural Collapse

TLDR
This paper studies the UFM for the regularized MSE loss, and shows that the minimizers’ features can have a more delicate structure than in the cross-entropy case, and extends the model by adding another layer of weights as well as ReLU nonlinearity to the model and generalize previous results.

Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training

TLDR
The Layer-Peeled Model is introduced, a nonconvex, yet analytically tractable, optimization program that inherits many characteristics of well-trained neural networks, thereby offering an effective tool for explaining and predicting common empirical patterns of deep-learning training.

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

TLDR
This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.

Prevalence of neural collapse during the terminal phase of deep learning training

TLDR
A now-standard training methodology: driving the cross-entropy loss to zero, continuing long after the classification error is already zero, is considered, helping to understand an important component of the modern deep learning training paradigm.

Dissecting Supervised Constrastive Learning

TLDR
This work proves, under mild assumptions, that both losses attain their minimum once the representations of each class collapse to the vertices of a regular simplex, inscribed in a hypersphere.

A Geometric Analysis of Neural Collapse with Unconstrained Features

TLDR
It is shown that the classical cross-entropy loss with weight decay has a benign global landscape, in the sense that the only global minimizers are the Simplex ETFs while all other critical points are strict saddles whose Hessian exhibit negative curvature directions.

Revealing the Structure of Deep Neural Networks via Convex Duality

TLDR
It is shown that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set and it is proved that each optimal weight matrix is rank-$K$ and aligns with the previous layers via duality.

Deep Residual Learning for Image Recognition

TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

Decoupling Representation and Classifier for Long-Tailed Recognition

TLDR
It is shown that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification.