• Corpus ID: 232269962

The Low-Rank Simplicity Bias in Deep Networks

@article{Huh2021TheLS,
  title={The Low-Rank Simplicity Bias in Deep Networks},
  author={Minyoung Huh and Hossein Mobahi and Richard Zhang and Brian Cheung and Pulkit Agrawal and Phillip Isola},
  journal={ArXiv},
  year={2021},
  volume={abs/2103.10427}
}
Modern deep neural networks are highly over-parameterized compared to the data on which they are trained, yet they often generalize remark-ably well. A flurry of recent work has asked: why do deep networks not overfit to their training data? In this work, we make a series of empirical observations that investigate and extend the hypothesis that deeper networks are inductively biased to find solutions with lower effective rank embeddings. We conjecture that this bias exists because the volume of… 
Training invariances and the low-rank phenomenon: beyond linear networks
TLDR
This paper extends the theoretical result that when one trains a deep linear network with logistic or exponential loss on linearly separable data, the weights converge to rank- 1 matrices to the last few linear layers of the much wider class of nonlinear ReLU-activated feedforward networks containing fully-connected layers and skip connections.
Machine Learning and Deep Learning -- A review for Ecologists
TLDR
A comprehensive overview of ML and DL is provided, starting with their historical developments, their algorithm families, their differences from traditional statistical tools, and universal ML principles, highlighting current and emerging applications for ecological problems.
Overcoming the Spectral Bias of Neural Value Approximation
TLDR
This work re-examine off-policy reinforcement learning through the lens of kernel regression and proposes to overcome such bias via a composite neural tangent kernel via the Fourier feature networks.
Neural Fields in Visual Computing
TLDR
A review of the literature on neural fields shows the breadth of topics already covered in visual computing, both historically and in current incarnations, and highlights the improved quality, flexibility, and capability brought by neural field methods.
A Falsificationist Account of Artificial Neural Networks
TLDR
It is argued that the idea of falsification is central to the methodology of machine learning and taking both aspects together gives rise to a falsi-cationist account of arti ficial neural networks.
SGD Noise and Implicit Low-Rank Bias in Deep Neural Networks
TLDR
It is shown, both theoretically and empirically, that when training a neural network using Stochastic Gradient Descent (SGD) with a small batch size, the resulting weight matrices are expected to be of small rank.
O VERCOMING T HE S PECTRAL B IAS OF N EURAL V ALUE A PPROXIMATION
TLDR
This work re-examine off-policy reinforcement learning through the lens of kernel regression and proposes to overcome such bias via a composite neural tangent kernel via the Fourier feature networks.
On the Origins of the Block Structure Phenomenon in Neural Network Representations
TLDR
This work investigates the origin of the block structure in relation to the data and training methods, and finds that it arises from dominant datapoints — a small group of examples that share similar image statistics (e.g. background color).
Neural Fields in Visual Computing and Beyond
TLDR
A review of the literature on neural fields shows the breadth of topics already covered in visual computing, both historically and in current incarnations, and highlights the improved quality, flexibility, and capability brought by neural field methods.
Lifting the Veil on Hyper-parameters for Value-based Deep Reinforcement Learning
TLDR
This study conducts an initial empirical investigation into a number of often-overlooked hyperparameters for value-based deep RL agents, demonstrating their varying levels of importance on a varied set of classic control environments.
...
...

References

SHOWING 1-10 OF 91 REFERENCES
Deep learning generalizes because the parameter-function map is biased towards simple functions
TLDR
This paper argues that the parameter-function map of many DNNs should be exponentially biased towards simple functions, and provides clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST.
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
TLDR
This paper suggests that, sometimes, increasing depth can speed up optimization and proves that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer.
Implicit Regularization in Matrix Factorization
TLDR
It is conjecture and theoretical evidence that with small enough step sizes and initialization close enough to the origin, gradient descent on a full dimensional factorization converges to the minimum nuclear norm solution.
Deep Residual Learning for Image Recognition
TLDR
This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
ImageNet Large Scale Visual Recognition Challenge
TLDR
The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared.
Implicit Regularization via Neural Feature Alignment
TLDR
This work highlights a regularization effect induced by a dynamical alignment of the neural tangent features introduced by Jacot et al, along a small number of task-relevant directions, by extrapolating a new analysis of Rademacher complexity bounds for linear models.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
TLDR
Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train.
Implicit Regularization in Deep Matrix Factorization
TLDR
This work studies the implicit regularization of gradient descent over deep linear neural networks for matrix completion and sensing, a model referred to as deep matrix factorization, and finds that adding depth to a matrix factorizations enhances an implicit tendency towards low-rank solutions.
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
TLDR
This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.
ImageNet classification with deep convolutional neural networks
TLDR
A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
...
...