FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

@article{Umuroglu2017FINNAF,
  title={FINN: A Framework for Fast, Scalable Binarized Neural Network Inference},
  author={Yaman Umuroglu and Nicholas J. Fraser and Giulio Gambardella and Michaela Blott and Philip Heng Wai Leong and Magnus Jahre and Kees A. Vissers},
  journal={Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays},
  year={2017}
}
Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. [...] Key Result To the best of our knowledge, ours are the fastest classification rates reported to date on these benchmarks.Expand
Multi-precision convolutional neural networks on heterogeneous hardware
TLDR
This paper proposes an alternative based on a multi-precision CNN frame-work that combines a binarised and a floating point CNN in a pipeline configuration deployed on heterogeneous hardware and demonstrates a flexible trade-off between accuracy and throughput. Expand
Scaling Binarized Neural Networks on Reconfigurable Logic
TLDR
It is shown how padding can be employed on BNNs while still maintaining a 1-bit datapath and high accuracy, and it is believed that a large BNN requiring 1.2 billion operations per frame can classify images at 12 kFPS with 671 μs latency while drawing less than 41 W board power and classifying CIFAR-10 images at 88.7% accuracy. Expand
BNNsplit: Binarized Neural Networks for embedded distributed FPGA-based computing systems
TLDR
A framework that extends FINN to a distributed scenario, enabling BNNs implementation on embedded multi-FPGA systems, and is well suitable for FPGAs, that are known to stand out when dealing with binary operations. Expand
LUTNet: Rethinking Inference in FPGA Soft Logic
TLDR
LUTNet is proposed, an end-to-end hardware-software framework for the construction of area-efficient FPGA-based neural network accelerators using the native LUTs as inference operators, and it is demonstrated that the exploitation of LUT flexibility allows for far heavier pruning than possible in prior works, resulting in significant area savings while achieving comparable accuracy. Expand
LUTNet : Rethinking Inference in FPGA Soft Logic
Research has shown that deep neural networks contain significant redundancy, and that high classification accuracies can be achieved even when weights and activations are quantised down to binaryExpand
BitStream: An efficient framework for inference of binary neural networks on CPUs
TLDR
In BitStream, a general architecture for efficient inference of BNN on CPUs is proposed, a simple but novel computation flow for BNN, where all the layers, including convolutional layers, binarization layers and pooling layers are all calculated in binary precision. Expand
Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA
TLDR
Two types of fast and energy-efficient architectures for BNN inference are proposed and analysis and insights are provided to pick the better strategy of these two for different datasets and network models. Expand
POLYBiNN: A Scalable and Efficient Combinatorial Inference Engine for Neural Networks on FPGA
TLDR
This work introduces POLYBiNN, a scalable and efficient combinatorial inference engine for DNNs and CNNs, composed of a stack of decision trees, and utilizes AND-OR gates instead of multipliers and accumulators. Expand
Throughput Optimizations for FPGA-based Deep Neural Network Inference
TLDR
This paper proposes novel architectures for the inference of previously learned and arbitrary deep neural networks on FPGA-based SoCs that are able to overcome limitations and surpass the data throughput of fully-featured x86-based systems while only using a fraction of their energy consumption. Expand
A Fully Onchip Binarized Convolutional Neural Network FPGA Impelmentation with Accurate Inference
TLDR
A new BNN algorithm, called Parallel-Convolution BNN (i.e. PC-BNN), is proposed, which replaces the original binary convolution layer in conventional BNN with two parallel binary Convolution layers and could greatly reduce the energy and delay overhead to load network parameter from off-chip memory. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1
TLDR
A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. Expand
fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs
Convolutional Neural Networks (ConvNets) are a powerful Deep Learning model, providing state-of-the-art accuracy to many emerging classification problems. However, ConvNet classification is aExpand
Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC
TLDR
This paper proposed a BNN hardware accelerator design, implemented the proposed accelerator on Aria 10 FPGA as well as 14-nm ASIC, and compared them against optimized software on Xeon server CPU, Nvidia Titan X server GPU, and Nvidia TX1 mobile GPU. Expand
Accelerating Deep Convolutional Neural Networks Using Specialized Hardware
TLDR
Hardware specialization in the form of GPGPUs, FPGAs, and ASICs offers a promising path towards major leaps in processing capability while achieving high energy efficiency, and combining multiple FPGA over a low-latency communication fabric offers further opportunity to train and evaluate models of unprecedented size and quality. Expand
YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights
TLDR
A HW accelerator optimized for BinaryConnect CNNs that achieves 1510 GOp/s on a core area of only 1.33 MGE and with a power dissipation of 153 mW in UMC 65 nm technology at 1.2 V is presented. Expand
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
TLDR
This work presents a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGAs resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth. Expand
Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks
TLDR
A novel dataflow, called row-stationary (RS), is presented, that minimizes data movement energy consumption on a spatial architecture and can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine local storage, direct inter-PE communication and spatial parallelism. Expand
Ternary neural networks for resource-efficient AI applications
TLDR
This paper proposes ternary neural networks (TNNs) in order to make deep learning more resource-efficient, and designs a purpose-built hardware architecture for TNNs and implements it on FPGA and ASIC. Expand
CNP: An FPGA-based processor for Convolutional Networks
TLDR
The implementation exploits the inherent parallelism of ConvNets and takes full advantage of multiple hardware multiplyaccumulate units on the FPGA and can be used for low-power, lightweight embedded vision systems for micro-UAVs and other small robots. Expand
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
TLDR
This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly. Expand
...
1
2
3
4
...