Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays

@inproceedings{Bose2020FullyEF,
  title={Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays},
  author={Laurie Bose and Jianing Chen and Stephen J. Carey and Piotr Dudek and W. Mayol-Cuevas},
  booktitle={ECCV},
  year={2020}
}
We present a novel method of CNN inference for pixel processor array (PPA) vision sensors, designed to take advantage of their massive parallelism and analog compute capabilities. PPA sensors consist of an array of processing elements (PEs), with each PE capable of light capture, data storage and computation, allowing various computer vision processing to be executed directly upon the sensor device. The key idea behind our approach is storing network weights "in-pixel" within the PEs of the PPA… 

Figures and Tables from this paper

High-speed Light-weight CNN Inference via Strided Convolutions on a Pixel Processor Array
TLDR
This work makes use of a novel visual device: the pixel processor array (PPA), to embed a convolutional neural network (CNN) onto the focal plane, allowing all multiplications to be replaced by more efficient addition/subtraction operations.
Near-Sensor Inference Architecture with Region Aware Processing
TLDR
A pixel processing architecture to facilitate CNN inference near the image sensor and addresses problems related to the mapping of computations onto an array of pixel processors and introduces a suitable network structure for communication.
On-Sensor Binarized Fully Convolutional Neural Network with A Pixel Processor Array
TLDR
The first implementation of an FCN on a PPA device, performing three convolution layers entirely in the pixel-level processors is demonstrated, and this architecture is used to demonstrate inference generating heat maps for object segmentation and localisation at over 280 FPS using the SCAMP-5 PPA vision chip.
An Ultra Fast Low Power Convolutional Neural Network Image Sensor with Pixel-level Computing
TLDR
A Processing-In-Pixel (PIP) CMOS sensor architecture, which allows convolution operation before the column readout circuit to significantly improve the image reading speed with much lower power consumption, is proposed.
A Reconfigurable Convolution-in-Pixel CMOS Image Sensor Architecture
TLDR
This paper proposes a PIP based CMOS sensor architecture, which allows convolution operation before the column readout circuit to significantly improve the image reading speed with much lower power consumption.
Towards an Efficient CNN Inference Architecture Enabling In-Sensor Processing †
TLDR
An attention-based pixel processing architecture to facilitate the CNN inference near the image sensor that significantly reduces dynamic power consumption and achieves high-speed up surpassing existing embedded processors’ computational capabilities.
HARP: Hierarchical Attention Oriented Region-Based Processing for High-Performance Computation in Vision Sensor
TLDR
A hardware architecture for smart cameras that understands the salient regions from an image frame and then performs high-level inference computation for sensor-level information creation instead of transporting raw pixels is presented.
Direct Servo Control from In-Sensor CNN Inference with A Pixel Processor Array
This work demonstrates direct visual sensory-motor control using high-speed CNN inference via a SCAMP-5 Pixel Processor Array (PPA). We demonstrate how PPAs are able to efficiently bridge the gap
Time-Ordered Recent Event (TORE) Volumes for Event Cameras
TLDR
Time-Ordered Recent Event (TORE) volumes are designed to compactly store raw spike timing information with minimal information loss and are an easy-to-implement replacement for any algorithm currently utilizing event representations.
Near-Sensor Distributed DNN Processing for Augmented and Virtual Reality
TLDR
This work explores how to optimally map DNN models on an AR/VR compute platform that consists of an on-sensor processor and an edge processor to minimize energy and latency, and develops the basic principles on network split, parameter caching, and two-processor balancing to achieve near-optimal system designs.
...
1
2
...

References

SHOWING 1-10 OF 21 REFERENCES
A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays
TLDR
A convolutional neural network implementation for pixel processor array (PPA) sensors, a first step towards embedding neural network processing capability directly onto the focal plane of a sensor.
Optimising convolutional neural networks for super fast inference on focal-plane sensor-processor arrays
TLDR
An in-depth FPSP-specific optimisation of all components constituting a CNN allows the architecture to beat the previous baseline by a margin of more than 4%, and reaches a testing accuracy of 96.9% on the MNIST dataset.
ShiDianNao: Shifting vision processing closer to the sensor
TLDR
This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs.
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
TLDR
A novel dataflow, called row-stationary (RS), is presented, that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism.
NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
TLDR
A flexible and efficient CNN accelerator architecture called NullHop that implements SOA CNNs useful for low-power and low-latency application scenarios and exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements is proposed.
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
TLDR
The design of a BNN accelerator is presented that is synthesized from C++ to FPGA-targeted Verilog and outperforms existing FPGAs-based CNN accelerators in GOPS as well as energy and resource efficiency.
Visual Odometry for Pixel Processor Arrays
TLDR
This work introduces methods of image scaling, rotation and alignment which are performed solely upon the PPA itself and form the basis for conducting motion estimation, and demonstrates the algorithms on a SCAMP-5 vision chip, achieving frame rates >1000Hz at ~2W power consumption.
A 100,000 fps vision sensor with embedded 535GOPS/W 256×256 SIMD processor array
TLDR
A vision chip operating with 1.9pJ/OP efficiency has been fabricated in 0.18μm CMOS and exploited to conduct real-time image processing operations at 100,000fps, locating a closed-shape object from amongst clutter.
14.6 A 1.42TOPS/W deep convolutional neural network recognition processor for intelligent IoE systems
In this paper, we present an energy-efficient CNN processor with 4 key features: (1) a CNN-optimized neuron processing engine (NPE), (2) a dual-range multiplyaccumulate (DRMAC) block for low-power
...
1
2
3
...