CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams

@article{Cavigelli2020CBinferEF,
  title={CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams},
  author={L. Cavigelli and L. Benini},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2020},
  volume={30},
  pages={1451-1465}
}
  • L. Cavigelli, L. Benini
  • Published 2020
  • Computer Science
  • IEEE Transactions on Circuits and Systems for Video Technology
The last few years have brought advances in computer vision at an amazing pace, grounded on new findings in deep neural network construction and training as well as the availability of large labeled datasets. Applying these networks to images demands a high computational effort and pushes the use of state-of-the-art networks on real-time video data out of reach of embedded platforms. Many recent works focus on reducing network complexity for real-time inference on embedded computing platforms… Expand
TempDiff: Temporal Difference-Based Feature Map-Level Sparsity Induction in CNNs with <4% Memory Overhead
  • Udari De Alwis, M. Alioto
  • Computer Science
  • 2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS)
  • 2021
TLDR
A computationally-efficient inference technique is introduced to perform the ubiquitously required task of bounding box-based object detection, which leverages the correlation among frames in the temporal dimension, uniquely requires minor memory overhead for intermediate feature map storage and architectural changes, and does not require any retraining for immediate deployment in existing vision frameworks. Expand
ACCELERATE INFERENCE OF CNNS FOR VIDEO ANALYSIS WHILE PRESERVING EXACTNESS EXPLOITING ACTIVATION SPARSITY
This paper proposes a range-bound-aware convolution layer that accelerates the inference of rectified linear unit (ReLU)-based convolutional neural networks (CNNs) for analyzing video streams. SinceExpand
Towards energy-efficient convolutional neural network inference
TLDR
This thesis first evaluates the capabilities of off-the-shelf software-programmable hardware before diving into specialized hardware accelerators and exploring the potential of extremely quantized CNNs, and gives special consideration to external memory bandwidth. Expand
Training for temporal sparsity in deep neural networks, application in video processing
TLDR
A new DNN layer is introduced, called Delta Activation Layer, whose sole purpose is to promote temporal sparsity of activations during training, and is implemented as an extension of the standard Tensoflow-Keras library, and applied to train deep neural networks on the Human Action Recognition dataset. Expand
EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators
TLDR
This work introduces and evaluates a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks, and achieves compression factors for gradient map compression during training that are even better than for inference. Expand
Extended Bit-Plane Compression for Convolutional Neural Network Accelerators
  • L. Cavigelli, L. Benini
  • Computer Science
  • 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
  • 2019
TLDR
This work introduces and evaluates a novel, hardware-friendly compression scheme for the feature maps present within convolutional neural networks and shows that an average compression ratio of 4.4× relative to uncompressed data and a gain of 60% over existing method can be achieved for ResNet-34 with a compression block requiring <300 bit of sequential cells and minimal combinational logic. Expand
EEG-TCNet: An Accurate Temporal Convolutional Network for Embedded Motor-Imagery Brain–Machine Interfaces
TLDR
This paper proposes EEG-TCNet, a novel temporal convolutional network (TCN) that achieves outstanding accuracy while requiring few trainable parameters, which makes it suitable for embedded classification on resource-limited devices at the edge. Expand
TinyRadarNN: Combining Spatial and Temporal Convolutional Neural Networks for Embedded Gesture Recognition with Short Range Radars
TLDR
A low-power high-accuracy embedded hand-gesture recognition algorithm targeting battery-operated wearable devices using low power short-range RADAR sensors is proposed, demonstrating that real-time prediction is feasible with only 21 mW of power consumption for the full TCN sequence prediction network, while a system-level power consumption of less than 100 mW is achieved. Expand
RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks
TLDR
Random Partition Relaxation (RPR) is presented, a method for strong quantization of neural networks weight to binary (+1/-1) and ternary (+1/0/1) values and an SGD-based training method that can be integrated into existing frameworks. Expand
FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things
TLDR
A FANN-on-MCU, an open-source toolkit built upon the fast artificial neural network (FANN) library to run lightweight and energy-efficient neural networks on microcontrollers based on both the ARM Cortex-M series and the novel RISC-V-based parallel ultralow-power (PULP) platform is presented. Expand
...
1
2
...

References

SHOWING 1-10 OF 56 REFERENCES
CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data
TLDR
A novel algorithm is proposed and evaluated for change-based evaluation of CNNs for video data recorded with a static camera setting, exploiting the spatio-temporal sparsity of pixel changes to achieve an average speed-up of 8.6x over a cuDNN baseline on a realistic benchmark. Expand
Accelerating real-time embedded scene labeling with convolutional networks
TLDR
This paper presents an optimized convolutional network implementation suitable for real-time scene labeling on embedded platforms and demonstrates that for scene labeling this approach achieves a 1.5x improvement in throughput when compared to a modern desktop CPU at a power budget of only 11 W. Expand
Clockwork Convnets for Video Semantic Segmentation
TLDR
This work defines a novel family of "clockwork" convnets driven by fixed or adaptive clock signals that schedule the processing of different layers at different update rates according to their semantic stability, and extends clockwork scheduling to adaptive video processing by incorporating data-driven clocks that can be tuned on unlabeled video. Expand
T-CNN: Tubelets With Convolutional Neural Networks for Object Detection From Videos
TLDR
A deep learning framework that incorporates temporal and contextual information from tubelets obtained in videos, which dramatically improves the baseline performance of existing still-image detection frameworks when they are applied to videos is proposed, called T-CNN. Expand
DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications
TLDR
This paper proposes DeepMon, a mobile deep learning inference system to run a variety of deep learning inferences purely on a mobile device in a fast and energy-efficient manner and designs a suite of optimization techniques to efficiently offload convolutional layers to mobile GPUs and accelerate the processing. Expand
Sigma Delta Quantized Networks
TLDR
An optimization method for converting any pre-trained deep network into an optimally efficient Sigma-Delta network is introduced, and it is shown that the algorithm, if run on the appropriate hardware, could cut at least an order of magnitude from the computational cost of processing video data. Expand
Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
TLDR
A language close to the mathematics of deep learning called Tensor Comprehensions offering both imperative and declarative styles, a polyhedral Just-In-Time compiler to convert a mathematical description of a deep learning DAG into a CUDA kernel with delegated memory management and synchronization, and a compilation cache populated by an autotuner are contributed. Expand
Temporal Pyramid Pooling-Based Convolutional Neural Network for Action Recognition
TLDR
A novel network structure, which allows an arbitrary number of frames as the network input, is proposed and can be learned on a small target data set because it can leverage the off-the-shelf image-level CNN for model parameter initialization. Expand
DeepCache: Principled Cache for Mobile Deep Vision
TLDR
The implementation of DeepCache works with unmodified deep learning models, requires zero developer's manual effort, and is therefore immediately deployable on off-the-shelf mobile devices. Expand
Evaluation of neural network architectures for embedded systems
TLDR
This work presents a comprehensive analysis of important metrics in practical applications: accuracy, memory footprint, parameters, operations count, inference time and power consumption, and believes it provides a compelling set of information that helps design and engineer efficient DNNs. Expand
...
1
2
3
4
5
...