Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale
@article{Iandola2016ExploringTD, title={Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale}, author={Forrest N. Iandola}, journal={ArXiv}, year={2016}, volume={abs/1612.06519} }
In recent years, the research community has discovered that deep neural networks (DNNs) and convolutional neural networks (CNNs) can yield higher accuracy than all previous solutions to a broad array of machine learning problems. To our knowledge, there is no single CNN/DNN architecture that solves all problems optimally. Instead, the “right” CNN/DNN architecture varies depending on the application at hand. CNN/DNNs comprise an enormous design space. Quantitatively, we find that a small region…
15 Citations
PositNN: Tapered Precision Deep Learning Inference for the Edge
- Computer Science
- 2018
This work proposes an ultra-low precision deep neural network, PositNN, that uses posits during inference that outperforms other low-precision neural networks and a 32-bit floating point baseline network.
Fine-Grained Energy and Performance Profiling framework for Deep Convolutional Neural Networks
- Computer Science
- 2018
Surprisingly, it is possible to refine the model to predict the number of SIMD instructions and main memory accesses solely from the application's Multiply-Accumulate (MAC) counts, hence, eliminating the need for actual measurements.
PositNN Framework: Tapered Precision Deep Learning Inference for the Edge
- Computer Science2019 IEEE Space Computing Conference (SCC)
- 2019
This work proposes a deep neural network framework, PositNN, that uses the posit numerical format and exact-dot-product operations during inference that outperforms other low-precision neural networks across all tasks.
Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures
- Computer ScienceACM Trans. Archit. Code Optim.
- 2018
Layrub, a runtime data placement strategy that orchestrates the execution of the training process, achieves layer-centric reuse to reduce memory consumption for extreme-scale deep learning that could not previously be run on a single GPU.
Keynote: small neural nets are beautiful: enabling embedded systems with small deep-neural- network architectures
- Computer Science2017 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)
- 2017
This work has found basic principles of design space exploration used to develop embedded microprocessor architectures to be highly applicable to the design of Deep Neural Net architectures and has used these design principles to create a novel Deep Neural net called SqueezeNet that requires only 480KB of storage for its model parameters.
Fine-grained energy profiling for deep convolutional neural networks on the Jetson TX1
- Computer Science2017 IEEE International Symposium on Workload Characterization (IISWC)
- 2017
This work presents a novel evaluation framework for measuring energy and performance for deep neural networks using ARMs Streamline Performance Analyser integrated with standard deep learning frameworks such as Caffe and CuDNNv5.
SyNERGY: An energy measurement and prediction framework for Convolutional Neural Networks on Jetson TX1
- Computer Science
- 2018
This work proposes "SyNERGY" a fine-grained energy measurement (that is, at specific layers) and prediction framework for deep neural networks on embedded platforms and finds that it is possible to refine the model to predict the number of SIMD instructions and main memory accesses solely from the application’s Multiply-Accumulate (MAC) counts.
A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks
- Computer Science2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)
- 2018
This paper sets out to reduce this significant communication cost by embedding data compression accelerators in the Network Interface Cards (NICs) and proposes an aggregator-free training algorithm that exchanges gradients in both legs of communication in the group, while the workers collectively perform the aggregation in a distributed manner.
Implementing Efficient, Portable Computations for Machine Learning
- Computer Science
- 2017
The Boda framework for implementing artificial neural network computations, based on metaprogramming, specialization, and autotuning is proposed, and in Boda, the development of efficient convolution operations across various types of hardware is explored.
An On-Chip Learning Accelerator for Spiking Neural Networks using STT-RAM Crossbar Arrays
- Computer Science2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)
- 2020
This work presents a scheme for implementing learning on a digital non-volatile memory (NVM) based hardware accelerator for Spiking Neural Networks (SNNs), and shows ~20× higher performance per unit Watt per unit area compared to conventional SRAM based design.
References
SHOWING 1-10 OF 223 REFERENCES
Convolutional neural networks at constrained time cost
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
This paper investigates the accuracy of CNNs under constrained time cost, and presents an architecture that achieves very competitive accuracy in the ImageNet dataset, yet is 20% faster than “AlexNet” [14] (16.0% top-5 error, 10-view test).
Beyond short snippets: Deep networks for video classification
- Computer Science2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- 2015
This work proposes and evaluates several deep neural network architectures to combine image information across a video over longer time periods than previously attempted, and proposes two methods capable of handling full length videos.
Striving for Simplicity: The All Convolutional Net
- Computer ScienceICLR
- 2015
It is found that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks.
Accelerating Deep Convolutional Neural Networks Using Specialized Hardware
- Computer Science
- 2015
Hardware specialization in the form of GPGPUs, FPGAs, and ASICs offers a promising path towards major leaps in processing capability while achieving high energy efficiency, and combining multiple FPGA over a low-latency communication fabric offers further opportunity to train and evaluate models of unprecedented size and quality.
Going Deeper with Embedded FPGA Platform for Convolutional Neural Network
- Computer ScienceFPGA
- 2016
This paper presents an in-depth analysis of state-of-the-art CNN models and shows that Convolutional layers are computational-centric and Fully-Connected layers are memory-centric, and proposes a CNN accelerator design on embedded FPGA for Image-Net large-scale image classification.
SpeeDO : Parallelizing Stochastic Gradient Descent for Deep Convolutional Neural Network
- Computer Science
- 2015
SpeeDO uses off-the-shelf hardwares to speed up CNN training, aiming to achieve two goals: improve deployability and cost effectiveness and serve as a benchmark on which software algorithmic approaches can be evaluated and improved.
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
- Computer ScienceFPGA
- 2015
This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly.
Systematic evaluation of convolution neural network advances on the Imagenet
- Computer ScienceComput. Vis. Image Underst.
- 2017
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2015
This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.