A Variational Information Bottleneck Based Method to Compress Sequential Networks for Human Action Recognition

  title={A Variational Information Bottleneck Based Method to Compress Sequential Networks for Human Action Recognition},
  author={Ayush Srivastava and Oshin Dutta and A. P. Prathosh and Sumeet Agarwal and Jigyasa Gupta},
  journal={2021 IEEE Winter Conference on Applications of Computer Vision (WACV)},
In the last few years, deep neural networks’ compression has become an important strand of machine learning and computer vision research. Deep models require sizeable computational complexity and storage when used, for instance, for Human Action Recognition (HAR) from videos, making them unsuitable to be deployed on edge devices. In this paper, we address this issue and propose a method to effectively compress Recurrent Neural Networks (RNNs) such as Gated Recurrent Units (GRUs) and Long-Short… 

Figures and Tables from this paper


Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition
A novel compact LSTM model is proposed, named as TR-LSTM, by utilizing the low-rank tensor ring decomposition (TRD) to reformulate the input-to-hidden transformation in RNNs, which is more stable than other tensor decomposition methods.
Tensor-Train Recurrent Neural Networks for Video Classification
A new, more general and efficient approach by factorizing the input-to-hidden weight matrix using Tensor-Train decomposition which is trained simultaneously with the weights themselves which provides a novel and fundamental building block for modeling high-dimensional sequential data with RNN architectures.
Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition
A deep fusion framework that more effectively exploits spatial features from CNNs with temporal features from LSTM models allowing it to achieve high accuracy outperforming current state-of-the-art methods in three widely used databases: UCF11, UCFSports, jHMDB.
Beyond short snippets: Deep networks for video classification
This work proposes and evaluates several deep neural network architectures to combine image information across a video over longer time periods than previously attempted, and proposes two methods capable of handling full length videos.
Compressing Recurrent Neural Networks Using Hierarchical Tucker Tensor Decomposition
This paper proposes to develop compact RNN models using Hierarchical Tucker (HT) decomposition, and shows that the proposed HT-based LSTM (HT-LSTM), consistently achieves simultaneous and significant increases in both compression ratio and test accuracy on different datasets.
Long-term recurrent convolutional networks for visual recognition and description
A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition
The proposed method, Block-Term RNN (BT-RNN), is not only more concise, but also able to attain a better approximation to the original RNNs with much fewer parameters, than alternative low-rank approximations.
Effective Quantization Approaches for Recurrent Neural Networks
This paper proposes an effective quantization approach for Recurrent Neural Networks (RNN) techniques including Long Short-Term Memory, Gated Recurrent Units (GRU), and Convolutional Long Short Term Memory (ConvLSTM) and shows promising results for both sentiment analysis and video frame prediction.
An attention mechanism based convolutional LSTM network for video action recognition
An attention mechanism based convolutional LSTM action recognition algorithm to improve the accuracy of recognition by extracting the salient regions of actions in videos effectively and adopting the analysis of temporal coherence to reduce the redundant features extracted by GoogleNet.
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
I3D models considerably improve upon the state-of-the-art in action classification, reaching 80.2% on HMDB-51 and 97.9% on UCF-101 after pre-training on Kinetics, and a new Two-Stream Inflated 3D Conv net that is based on 2D ConvNet inflation is introduced.