Corpus ID: 233388124

Skip-Convolutions for Efficient Video Processing

@inproceedings{Habibian2021SkipConvolutionsFE,
  title={Skip-Convolutions for Efficient Video Processing},
  author={A. Habibian and Davide Abati and T. Cohen and B. E. Bejnordi},
  booktitle={CVPR},
  year={2021}
}
We propose Skip-Convolutions to leverage the large amount of redundancies in video streams and save computations. Each video is represented as a series of changes across frames and network activations, denoted as residuals. We reformulate standard convolution to be efficiently computed on residual frames: each layer is coupled with a binary gate deciding whether a residual is important to the model prediction, e.g. foreground regions, or it can be safely skipped, e.g. background regions. These… Expand

References

SHOWING 1-10 OF 69 REFERENCES
Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference
TLDR
An efficient CUDA implementation of the dynamic convolutions conditioned on the input image using a gather-scatter approach is provided, achieving a significant improvement in inference speed on MobileNetV2 and ShuffleNet V2. Expand
SBNet: Sparse Blocks Network for Fast Inference
TLDR
This work leverages the sparsity structure of computation masks and proposes a novel tiling-based sparse convolution algorithm that is effective on LiDAR-based 3D object detection, and reports significant wall-clock speed-ups compared to dense convolution without noticeable loss of accuracy. Expand
Dynamic Channel Pruning: Feature Boosting and Suppression
TLDR
This paper proposes feature boosting and suppression (FBS), a new method to predictively amplify salient convolutional channels and skip unimportant ones at run-time, and compares FBS to a range of existing channel pruning and dynamic execution schemes and demonstrates large improvements on ImageNet classification. Expand
Clockwork Convnets for Video Semantic Segmentation
TLDR
This work defines a novel family of "clockwork" convnets driven by fixed or adaptive clock signals that schedule the processing of different layers at different update rates according to their semantic stability, and extends clockwork scheduling to adaptive video processing by incorporating data-driven clocks that can be tuned on unlabeled video. Expand
LSTM Pose Machines
TLDR
It is shown that if the weight sharing scheme is imposed to the multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN), which decouples the relationship among multiple network stages and results in significantly faster speed in invoking the network for videos. Expand
Temporally Distributed Networks for Fast Video Semantic Segmentation
We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation. We observe that features extracted from a certain high-level layer of a deep CNN can beExpand
TSM: Temporal Shift Module for Efficient Video Understanding
TLDR
A generic and effective Temporal Shift Module (TSM) that can achieve the performance of 3D CNN but maintain 2D CNN’s complexity and is extended to online setting, which enables real-time low-latency online video recognition and video object detection. Expand
Low-Latency Video Semantic Segmentation
TLDR
A framework for video semantic segmentation is developed, which incorporates two novel components: a feature propagation module that adaptively fuses features over time via spatially variant convolution, thus reducing the cost of per-frame computation and an adaptive scheduler that dynamically allocate computation based on accuracy prediction. Expand
Speeding up Convolutional Neural Networks with Low Rank Expansions
TLDR
Two simple schemes for drastically speeding up convolutional neural networks are presented, achieved by exploiting cross-channel or filter redundancy to construct a low rank basis of filters that are rank-1 in the spatial domain. Expand
Deep Feature Flow for Video Recognition
TLDR
Deep feature flow is presented, a fast and accurate framework for video recognition that runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field and achieves significant speedup as flow computation is relatively fast. Expand
...
1
2
3
4
5
...