Training compact deep learning models for video classification using circulant matrices

@inproceedings{Araujo2018TrainingCD,
  title={Training compact deep learning models for video classification using circulant matrices},
  author={Alexandre Araujo and Benjamin N{\'e}grevergne and Yann Chevaleyre and Jamal Atif},
  booktitle={ECCV Workshops},
  year={2018}
}
In real world scenarios, model accuracy is hardly the only factor to consider. Large models consume more memory and are computationally more intensive, which make them difficult to train and to deploy, especially on mobile devices. In this paper, we build on recent results at the crossroads of Linear Algebra and Deep Learning which demonstrate how imposing a structure on large weight matrices can be used to reduce the size of the model. Building on these results, we propose very compact models… 
On the Expressive Power of Deep Fully Circulant Neural Networks
TLDR
It is proved that the function space spanned by circulant networks of bounded depth includes the oneSpanned by dense networks with specific properties on their rank, which indicates the need for principled techniques for training these models.
Understanding and Training Deep Diagonal Circulant Neural Networks
TLDR
This paper designs an initialization scheme and proposed a smart use of non-linearity functions in order to train deep diagonal circulant networks, and shows that these networks outperform recently introduced deep networks with other types of structured layers.
On the Expressive Power of Deep Diagonal Circulant Neural Networks
TLDR
It is proved that the function space spanned by diagonal circulant networks of bounded depth includes the oneSpanned by dense networks with specific properties on their rank, which indicates that the deep neural networks introduced in this paper outperform the recently introduced deep networks with other types of structured layers.
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
TLDR
A novel approach, called AR-Net (Adaptive Resolution Network), that selects on-the-fly the optimal resolution for each frame conditioned on the input for efficient action recognition in long untrimmed videos.
Soft-Label : A Strategy to Expand Dataset for Large-scale Fine-grained Video Classification
In this paper, the solution for The 3rd YouTube-8M Video Understanding Challenge is introduced. The final submission achieves 0.78687 MAP score in the private leaderboard, which is ranked 10th. This
Towards deep neural network compression via learnable wavelet transforms
TLDR
This paper shows how the fast wavelet transform can be used to compress linear layers in neural networks and has significantly fewer parameters yet still perform competitively with the state-of-the-art on synthetic and real-world RNN benchmarks.
The 2nd YouTube-8M Large-Scale Video Understanding Challenge
TLDR
This paper briefly introduces the YouTube-8M dataset and challenge task, followed by participants statistics and result analysis, and summarizes proposed ideas by participants, including architectures, temporal aggregation methods, ensembling and distillation, data augmentation, and more.
Canonical convolutional neural networks
TLDR
The canonical re-parametrization leads to competitive normalization performance on the MNIST, CIFAR10, and SVHN data sets and allows convenient model-compression by truncating the parameter sums.
Neural Network Compression via Learnable Wavelet Transforms
TLDR
This paper shows how the fast wavelet transform can be used to compress linear layers in neural networks and has significantly fewer parameters yet still perform competitively with the state-of-the-art on synthetic and real-world RNN benchmarks.
Technical Sound Event Classification Applying Recurrent and Convolutional Neural Networks
TLDR
A new approach by generation of a concept for neural networks and realization by especially convolutional networks shows the power of technical sound classification methods, and first results show thePower of the RNNs and CNNs.
...
...

References

SHOWING 1-10 OF 41 REFERENCES
Learnable pooling with Context Gating for video classification
TLDR
This work explores combinations of learnable pooling techniques such as Soft Bag-of-words, Fisher Vectors, NetVLAD, GRU and LSTM to aggregate video features over time, and introduces a learnable non-linear network unit, named Context Gating, aiming at modeling in-terdependencies between features.
Structured Transforms for Small-Footprint Deep Learning
TLDR
In keyword spotting applications in mobile speech recognition, the proposed unified framework to learn a broad family of structured parameter matrices that are characterized by the notion of low displacement rank are much more effective than standard linear low-rank bottleneck layers and nearly retain the performance of state of the art models.
Beyond short snippets: Deep networks for video classification
TLDR
This work proposes and evaluates several deep neural network architectures to combine image information across a video over longer time periods than previously attempted, and proposes two methods capable of handling full length videos.
Large-Scale Video Classification with Convolutional Neural Networks
TLDR
This work studies multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggests a multiresolution, foveated architecture as a promising way of speeding up the training.
Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks
TLDR
Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics.
On Compressing Deep Models by Low Rank and Sparse Decomposition
TLDR
A unified framework integrating the low-rank and sparse decomposition of weight matrices with the feature map reconstructions is proposed, which can significantly reduce the parameters for both convolutional and fully-connected layers.
Memory Bounded Deep Convolutional Networks
TLDR
This work investigates the use of sparsity-inducing regularizers during training of Convolution Neural Networks and shows that training with such regularization can still be performed using stochastic gradient descent implying that it can be used easily in existing codebases.
An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections
We explore the redundancy of parameters in deep neural networks by replacing the conventional linear projection in fully-connected layers with the circulant projection. The circulant structure
Sparse Convolutional Neural Networks
TLDR
This work shows how to reduce the redundancy in these parameters using a sparse decomposition, and proposes an efficient sparse matrix multiplication algorithm on CPU for Sparse Convolutional Neural Networks (SCNN) models.
Ternary Neural Networks with Fine-Grained Quantization
TLDR
A novel fine-grained quantization (FGQ) method to ternarize pre-trained full precision models, while also constraining activations to 8 and 4-bits is proposed, which enables a full 8/4-bit inference pipeline, with best-reported accuracy using ternary weights on ImageNet dataset.
...
...