BranchyNet: Fast inference via early exiting from deep neural networks
- Surat Teerapittayanon, Bradley McDanel, H. T. Kung
- Computer ScienceInternational Conference on Pattern Recognition
- 1 December 2016
The BranchyNet architecture is presented, a novel deep network architecture that is augmented with additional side branch classifiers that can both improve accuracy and significantly reduce the inference time of the network.
Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices
- Surat Teerapittayanon, Bradley McDanel, H. T. Kung
- Computer ScienceIEEE International Conference on Distributed…
- 5 June 2017
Compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x.
Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization
- H. T. Kung, Bradley McDanel, S. Zhang
- Computer ScienceInternational Conference on Architectural Support…
- 7 November 2018
This paper describes a novel approach of packing sparse convolutional neural networks into a denser format for efficient implementations using systolic arrays and demonstrates that in mitigating data privacy concerns the retraining can be accomplished with only fractions of the original dataset.
Embedded Binarized Neural Networks
- Bradley McDanel, Surat Teerapittayanon, H. T. Kung
- Computer ScienceEuropean Conference/Workshop on Wireless Sensor…
- 20 February 2017
eBNN reorders the computation of inference while preserving the original BNN structure, and uses just a single floating-point temporary for the entire neural network, leading to a 32x reduction in required temporary space.
Term Quantization: Furthering Quantization at Run Time
- H. T. Kung, Bradley McDanel, S. Zhang
- Computer ScienceInternational Conference for High Performance…
- 1 November 2020
We present a novel technique, called Term Quantization (TQ), for furthering quantization at run time for improved computational efficiency of deep neural networks (DNNs) already quantized with…
Full-stack optimization for accelerating CNNs using powers-of-two weights with FPGA validation
- Bradley McDanel, S. Zhang, H. T. Kung, Xin Dong
- Computer ScienceInternational Conference on Supercomputing
- 26 June 2019
A highlight of this full-stack optimization framework is an efficient Selector-Accumulator (SAC) architecture for implementing CNNs with powers-of-two weights which has 9x higher energy efficiency compared to other implementations while achieving comparable latency.
Maestro: A Memory-on-Logic Architecture for Coordinated Parallel Use of Many Systolic Arrays
- H. T. Kung, Bradley McDanel, S. Zhang, Xin Dong, Chih Chiang Chen
- Computer ScienceIEEE International Conference on Application…
- 15 July 2019
The Maestro architecture, including a circuit and layout design, detail scheduling of the switch, analyze system performance for real-time inference applications using input with batch size equal to one, and showcase applications for deep learning inference are described.
Adaptive Tiling: Applying Fixed-size Systolic Arrays To Sparse Convolutional Neural Networks
- H. T. Kung, Bradley McDanel, S. Zhang
- Computer ScienceInternational Conference on Pattern Recognition
- 1 August 2018
We introduce adaptive tiling, a method of partitioning layers in a sparse convolutional neural network (CNN) into blocks of filters and channels, called tiles, each implementable with a fixed-size…
Systolic Building Block for Logic-on-Logic 3D-IC Implementations of Convolutional Neural Networks
- H. T. Kung, Bradley McDanel, Douglas Yu
- Computer ScienceInternational Symposium on Circuits and Systems
- 1 May 2019
The building block can form systolic arrays for implementing low-latency, energy-efficient CNN inference for models of any size, while incorporating advanced packaging features such as “logic-on-logic” 3D-IC (micro-bump/TSV, monolithic 3D or other 3D technology).
FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding
- S. Zhang, Bradley McDanel, H. Kung
- Computer ScienceInternational Symposium on High-Performance…
- 28 October 2021
A Fast First, Accurate Second Training (FAST) system for DNNs, where the weights, activations, and gradients are represented in BFP, demonstrating a 2-6× speedup in training on a single-chip platform over prior work based on mixed-precision or block floating point number systems.
...
...