• Corpus ID: 238634309

Mining the Weights Knowledge for Optimizing Neural Network Structures

  title={Mining the Weights Knowledge for Optimizing Neural Network Structures},
  author={Mengqiao Han and Xiabi Liu and Zhaoyang Hai and Xin Duan},
Knowledge embedded in the weights of the artificial neural network can be used to improve the network structure, such as in network compression. However, the knowledge is set up by hand, which may not be very accurate, and relevant information may be overlooked. Inspired by how learning works in the mammalian brain, we mine the knowledge contained in the weights of the neural network toward automatic architecture learning in this paper. We introduce a switcher neural network (SNN) that uses as… 

Figures and Tables from this paper

DeepCompNet: A Novel Neural Net Model Compression Architecture
An innovative hybrid compression pipeline for compressing neural networks exploiting the untapped potential of z-score in weight pruning, followed by quantization using DBSCAN clustering and Huffman encoding is reported.


Learning both Weights and Connections for Efficient Neural Network
A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.
Keeping the neural networks simple by minimizing the description length of the weights
A method of computing the derivatives of the expected squared error and of the amount of information in the noisy weights in a network that contains a layer of non-linear hidden units without time-consuming Monte Carlo simulations is described.
Discovering Neural Wirings
DNW provides an effective mechanism for discovering sparse subnetworks of predefined architectures in a single training run and is regarded as unifying core aspects of the neural architecture search problem with sparse neural network learning.
GASL: Guided Attention for Sparsity Learning in Deep Neural Networks
Guided Attention for Sparsity Learning (GASL) is proposed to achieve model compression by having less number of elements and speed-up and introduce a generic mechanism that can be adapted for any type of architecture.
Learning Sparse Neural Networks through L0 Regularization
A practical method for L_0 norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero, which allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way.
What’s Hidden in a Randomly Weighted Neural Network?
It is empirically show that as randomly weighted neural networks with fixed weights grow wider and deeper, an ``untrained subnetwork" approaches a network with learned weights in accuracy.
Learning the Number of Neurons in Deep Networks
This paper proposes to make use of a group sparsity regularizer on the parameters of the network, where each group is defined to act on a single neuron, and shows that this approach can reduce the number of parameters by up to 80\% while retaining or even improving the network accuracy.
Distilling the Knowledge in a Neural Network
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning
This paper introduces a principled method for learning reduced network architectures in a data-driven way using reinforcement learning and can achieve compression rates of more than 10x for models such as ResNet-34 while maintaining similar performance to the input `teacher' network.
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
This work finds that dense, randomly-initialized, feed-forward networks contain subnetworks ("winning tickets") that - when trained in isolation - reach test accuracy comparable to the original network in a similar number of iterations, and articulate the "lottery ticket hypothesis".