• Corpus ID: 6856808

Sigma Delta Quantized Networks

  title={Sigma Delta Quantized Networks},
  author={Peter O'Connor and Max Welling},
Deep neural networks can be obscenely wasteful. When processing video, a convolutional network expends a fixed amount of computation for each frame with no regard to the similarity between neighbouring frames. As a result, it ends up repeatedly doing very similar computations. To put an end to such waste, we introduce Sigma-Delta networks. With each new input, each layer in this network sends a discretized form of its change in activation to the next layer. Thus the amount of computation that… 

Figures and Tables from this paper

CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams

  • L. CavigelliL. Benini
  • Computer Science
    IEEE Transactions on Circuits and Systems for Video Technology
  • 2020
This work adopts an orthogonal viewpoint and proposes a novel algorithm exploiting the spatio-temporal sparsity of pixel changes that resulted in an average speed-up of 9.1X over cuDNN on the Tegra X2 platform at a negligible accuracy loss and a lower power consumption.

Learning to Sparsify Differences of Synaptic Signal for Efficient Event Processing

Experiments show that the proposed framework can reduce MAC by a factor of 32 to 240 compared to dense convolution while maintaining comparable accuracy, which is several times better than the current state-of-the-art methods.

DNN Feature Map Compression using Learned Representation over GF(2)

The proposed network architectures derived from modified SqueezeNet and MobileNetV2 to the tasks of ImageNet classification and PASCAL VOC object detection show a factor of 2 decrease in memory requirements with minor degradation in accuracy while adding only bitwise computations.

Delta Networks for Optimized Recurrent Network Computation

It is shown that a naive run-time delta network implementation offers modest improvements on the number of memory accesses and computes, but optimized training techniques confer higher accuracy at higher speedup.

Towards energy-efficient convolutional neural network inference

This thesis first evaluates the capabilities of off-the-shelf software-programmable hardware before diving into specialized hardware accelerators and exploring the potential of extremely quantized CNNs, and gives special consideration to external memory bandwidth.

A Survey on Methods and Theories of Quantized Neural Networks

A thorough review of different aspects of quantized neural networks is given, recognized as one of the most effective approaches to satisfy the extreme memory requirements that deep neural network models demand.

Temporally Efficient Deep Learning with Spikes

This work presents a variant on backpropagation for neural networks in which computation scales with the rate of change of the data - not the rate at which the authors process the data, and does this by having neurons communicate a combination of their state, and their temporal change in state.

Training for temporal sparsity in deep neural networks, application in video processing

A new DNN layer is introduced, called Delta Activation Layer, whose sole purpose is to promote temporal sparsity of activations during training, and is implemented as an extension of the standard Tensoflow-Keras library, and applied to train deep neural networks on the Human Action Recognition dataset.

EVA²: Exploiting Temporal Redundancy in Live Computer Vision

A new algorithm, activation motion compensation, detects changes in the visual input and incrementally updates a previously-computed activation and applies well-known motion estimation techniques to adapt to visual changes to avoid unnecessary computation on most frames.

EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

This work introduces and evaluates a novel, hardware-friendly, and lossless compression scheme for the feature maps present within convolutional neural networks, and achieves compression factors for gradient map compression during training that are even better than for inference.



Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.

Very Deep Convolutional Networks for Large-Scale Image Recognition

This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing

The method for converting an ANN into an SNN enables low-latency classification with high accuracies already after the first output spike, and compared with previous SNN approaches it yields improved performance without increased training time.

Deep Spiking Networks

It is shown that the spiking Multi-Layer Perceptron behaves identically, during both prediction and training, to a conventional deep network of rectified-linear units, in the limiting case where the network is run for a long time.

Training Deep Spiking Neural Networks Using Backpropagation

A novel technique is introduced, which treats the membrane potentials of spiking neurons as differentiable signals, where discontinuities at spike times are considered as noise, which enables an error backpropagation mechanism for deep SNNs that follows the same principles as in conventional deep networks, but works directly on spike signals and membranes potentials.

Convolutional networks for fast, energy-efficient neuromorphic computing

This approach allows the algorithmic power of deep learning to be merged with the efficiency of neuromorphic processors, bringing the promise of embedded, intelligent, brain-inspired computing one step closer.

Fast and Efficient Asynchronous Neural Computation with Adapting Spiking Neural Networks

It is shown that these adaptive spiking neurons can be drop in replacements for ReLU neurons in standard feedforward ANNs comprised of such units, and that this can also be successfully applied to a ReLU based deep convolutional neural network for classifying the MNIST dataset.

A 128 128 120 dB 15 s Latency Asynchronous Temporal Contrast Vision Sensor

This silicon retina provides an attractive combination of characteristics for low-latency dynamic vision under uncontrolled illumination with low post-processing requirements by providing high pixel bandwidth, wide dynamic range, and precisely timed sparse digital output.

A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor

This silicon retina provides an attractive combination of characteristics for low-latency dynamic vision under uncontrolled illumination with low post-processing requirements by providing high pixel bandwidth, wide dynamic range, and precisely timed sparse digital output.

ImageNet Large Scale Visual Recognition Challenge

The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared.