Understanding and Training Deep Diagonal Circulant Neural Networks

  title={Understanding and Training Deep Diagonal Circulant Neural Networks},
  author={Alexandre Araujo and Benjamin N{\'e}grevergne and Yann Chevaleyre and Jamal Atif},
In this paper, we study deep diagonal circulant neural networks, that is deep neural networks in which weight matrices are the product of diagonal and circulant ones. Besides making a theoretical analysis of their expressivity, we introduced principled techniques for training these models: we devise an initialization scheme and proposed a smart use of non-linearity functions in order to train deep diagonal circulant networks. Furthermore, we show that these networks outperform recently… 
1 Citations

Figures and Tables from this paper

Structured LISTA for Multidimensional Harmonic Retrieval
A structured LISTA-Toeplitzer network is proposed, which imposes Toeplitz structure on the mutual inhibition matrices and applies linear convolution instead of matrix-vector multiplications in traditional LISA.


ACDC: A Structured Efficient Linear Layer
A deep, differentiable, fully-connected neural network module composed of diagonal matrices of parameters, $\mathbf{A}$ and $D}$, and the discrete cosine transform, illustrating how ACDC could in principle be implemented with lenses and diffractive elements.
Implicit Regularization in Deep Matrix Factorization
This work studies the implicit regularization of gradient descent over deep linear neural networks for matrix completion and sensing, a model referred to as deep matrix factorization, and finds that adding depth to a matrix factorizations enhances an implicit tendency towards low-rank solutions.
On the Expressive Power of Deep Fully Circulant Neural Networks
It is proved that the function space spanned by circulant networks of bounded depth includes the oneSpanned by dense networks with specific properties on their rank, which indicates the need for principled techniques for training these models.
Compression of Deep Neural Networks by Combining Pruning and Low Rank Decomposition
This approach achieves up to 57% higher model compression when compared to either Tucker Decomposition or Filter pruning alone at similar accuracy for GoogleNet and reduces the Flops by up to 48% thereby making the inferencing faster.
Training compact deep learning models for video classification using circulant matrices
This paper builds on recent results at the crossroads of Linear Algebra and Deep Learning which demonstrate how imposing a structure on large weight matrices can be used to reduce the size of the model, and proposes very compact models for video classification based on state-of-the-art network architectures such as Deep Bag- of-Frames, NetVLAD and NetFisherVectors.
Constrained Optimization Based Low-Rank Approximation of Deep Neural Networks
COBLA is empirically demonstrate that COBLA outperforms prior art using the SqueezeNet and VGG-16 architecture on the ImageNet dataset and is approximately solved by sequential quadratic programming.
Learning Compressed Transforms with Low Displacement Rank
A rich class of LDR matrices with more general displacement operators is introduced, and explicitly learn over both the operators and the low-rank component, which exceeds the accuracy of existing compression approaches and on some tasks even outperform general unstructured layers while using more than 20X fewer parameters.
The Singular Values of Convolutional Layers
It is shown that this is an effective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2\% to 5.3\%.