# Understanding and Training Deep Diagonal Circulant Neural Networks

@inproceedings{Araujo2020UnderstandingAT,
title={Understanding and Training Deep Diagonal Circulant Neural Networks},
author={Alexandre Araujo and Benjamin N{\'e}grevergne and Yann Chevaleyre and Jamal Atif},
booktitle={ECAI},
year={2020}
}
• Published in ECAI 29 January 2019
• Computer Science
In this paper, we study deep diagonal circulant neural networks, that is deep neural networks in which weight matrices are the product of diagonal and circulant ones. Besides making a theoretical analysis of their expressivity, we introduced principled techniques for training these models: we devise an initialization scheme and proposed a smart use of non-linearity functions in order to train deep diagonal circulant networks. Furthermore, we show that these networks outperform recently…
1 Citations

## Figures and Tables from this paper

Structured LISTA for Multidimensional Harmonic Retrieval
• Computer Science
IEEE Transactions on Signal Processing
• 2021
A structured LISTA-Toeplitzer network is proposed, which imposes Toeplitz structure on the mutual inhibition matrices and applies linear convolution instead of matrix-vector multiplications in traditional LISA.

## References

SHOWING 1-10 OF 36 REFERENCES
ACDC: A Structured Efficient Linear Layer
• Computer Science
ICLR
• 2016
A deep, differentiable, fully-connected neural network module composed of diagonal matrices of parameters, $\mathbf{A}$ and $D}$, and the discrete cosine transform, illustrating how ACDC could in principle be implemented with lenses and diffractive elements.
Implicit Regularization in Deep Matrix Factorization
• Computer Science
NeurIPS
• 2019
This work studies the implicit regularization of gradient descent over deep linear neural networks for matrix completion and sensing, a model referred to as deep matrix factorization, and finds that adding depth to a matrix factorizations enhances an implicit tendency towards low-rank solutions.
On the Expressive Power of Deep Fully Circulant Neural Networks
• Computer Science
ArXiv
• 2019
It is proved that the function space spanned by circulant networks of bounded depth includes the oneSpanned by dense networks with specific properties on their rank, which indicates the need for principled techniques for training these models.
Compression of Deep Neural Networks by Combining Pruning and Low Rank Decomposition
• Computer Science
2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
• 2019
This approach achieves up to 57% higher model compression when compared to either Tucker Decomposition or Filter pruning alone at similar accuracy for GoogleNet and reduces the Flops by up to 48% thereby making the inferencing faster.
Training compact deep learning models for video classification using circulant matrices
• Computer Science
ECCV Workshops
• 2018
This paper builds on recent results at the crossroads of Linear Algebra and Deep Learning which demonstrate how imposing a structure on large weight matrices can be used to reduce the size of the model, and proposes very compact models for video classification based on state-of-the-art network architectures such as Deep Bag- of-Frames, NetVLAD and NetFisherVectors.
Constrained Optimization Based Low-Rank Approximation of Deep Neural Networks
• Computer Science
ECCV
• 2018
COBLA is empirically demonstrate that COBLA outperforms prior art using the SqueezeNet and VGG-16 architecture on the ImageNet dataset and is approximately solved by sequential quadratic programming.
Learning Compressed Transforms with Low Displacement Rank
• Computer Science
NeurIPS
• 2018
A rich class of LDR matrices with more general displacement operators is introduced, and explicitly learn over both the operators and the low-rank component, which exceeds the accuracy of existing compression approaches and on some tasks even outperform general unstructured layers while using more than 20X fewer parameters.
The Singular Values of Convolutional Layers
• Computer Science, Mathematics
ICLR
• 2019
It is shown that this is an effective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2\% to 5.3\%.