What’s Hidden in a Randomly Weighted Neural Network?

  title={What’s Hidden in a Randomly Weighted Neural Network?},
  author={Vivek Ramanujan and Mitchell Wortsman and Aniruddha Kembhavi and Ali Farhadi and Mohammad Rastegari},
  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
Training a neural network is synonymous with learning the values of the weights. By contrast, we demonstrate that randomly weighted neural networks contain subnetworks which achieve impressive performance without ever training the weight values. Hidden in a randomly weighted Wide ResNet-50 is a subnetwork (with random weights) that is smaller than, but matches the performance of a ResNet-34 trained on ImageNet. Not only do these ``untrained subnetworks" exist, but we provide an algorithm to… 

Figures and Tables from this paper

Pruned Neural Networks are Surprisingly Modular

A measurable notion of modularity for multi-layer perceptrons (MLPs) is introduced, and it is found that training and weight pruning produces MLPs that are more modular than randomly initialized ones, and often significantly more modules than random MLPs with the same (sparse) distribution of weights.

Learning from Randomly Initialized Neural Network Features

It is presented the surprising result that randomly initialized neural networks are good feature extractors in expectation, and suggests that certain structures that manifest in a trained model are already present at initialization.

Finding Dense Supermasks in Randomly Initialized Neural Networks

This work removes components from randomly weighted neural networks – neurons from fully connected layers – such that the loss of the networks decreases continuously, resulting in smaller, dense networks whose accuracy is higher than their initial version.

Bit-wise Training of Neural Network Weights

We introduce an algorithm where the individual bits representing the weights of a neural network are learned. This method allows training weights with integer values on arbitrary bit-depths and

A Probabilistic Approach to Neural Network Pruning

This work theoretically study the performance of two pruning techniques (random and magnitudebased) on FCNs and CNNs and establishes that there exist pruned networks with expressive power within any specified bound from the target network.

What’s Hidden in a One-layer Randomly Weighted Transformer?

We demonstrate that, hidden within one-layer randomly weighted neural networks, there exist subnetworks that can achieve impressive performance, without ever modifying the weight initializations, on

Training highly effective connectivities within neural networks with randomly initialized, fixed weights

Some novel, straightforward methods for training the connection graph of a randomly initialized neural network without training the weights are presented and shed light on the over-parameterization of neural networks and on how they may be reduced to their effective size.

Evolutionary strategies for finding sparse randomly weighted subnetworks

This project revisits evolutionary learning as a method to find high performing sparse subnetworks in randomly initialized neural networks and shows these sparse neural networks can be found for a subset of control/locomotive problems from OpenAI’s gym with a simple evolutionary algorithm that is highly parallelizable.

Mining the Weights Knowledge for Optimizing Neural Network Structures

Inspired by how learning works in the mammalian brain, a switcher neural network is introduced that uses as inputs the weights of a task-specific neural network (called TNN for short) and mine the knowledge contained in the weights toward automatic architecture learning.

Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks

By first folding ResNet into a recurrent structure and then searching for an accurate subnetwork hidden within the randomly initialized model, a high-performing yet tiny HFN is obtained without ever updating the weights.



Weight Agnostic Neural Networks

This work proposes a search method for neural network architectures that can already perform a task without any explicit weight training, and demonstrates that this method can find minimal neural network architecture that can perform several reinforcement learning tasks without weight training.

Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing

This paper proposes to fix almost all layers of a deep convolutional neural network, allowing only a small portion of the weights to be learned, and suggests practical ways to harness it to create more robust and compact representations.

Learning representations by back-propagating errors

Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.

Discovering Neural Wirings

DNW provides an effective mechanism for discovering sparse subnetworks of predefined architectures in a single training run and is regarded as unifying core aspects of the neural architecture search problem with sparse neural network learning.

A Simple Weight Decay Can Improve Generalization

It is proven that a weight decay has two effects in a linear network, and it is shown how to extend these results to networks with hidden layers and non-linear units.

Understanding the difficulty of training deep feedforward neural networks

The objective here is to understand better why standard gradient descent from random initialization is doing so poorly with deep neural networks, to better understand these recent relative successes and help design better algorithms in the future.

Neural Architecture Search with Reinforcement Learning

This paper uses a recurrent network to generate the model descriptions of neural networks and trains this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set.

Exploring Randomly Wired Neural Networks for Image Recognition

The results suggest that new efforts focusing on designing better network generators may lead to new breakthroughs by exploring less constrained search spaces with more room for novel design.

Learning Multiple Layers of Features from Tiny Images

It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.

Wide Residual Networks

This paper conducts a detailed experimental study on the architecture of ResNet blocks and proposes a novel architecture where the depth and width of residual networks are decreased and the resulting network structures are called wide residual networks (WRNs), which are far superior over their commonly used thin and very deep counterparts.