Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
This work presents a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGAs resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth.
A 45nm CMOS neuromorphic chip with a scalable architecture for learning in networks of spiking neurons
- Jae-sun Seo, B. Brezzo, D. Friedman
- Computer Science, BiologyIEEE Custom Integrated Circuits Conference (CICC)
- 20 October 2011
A new architecture is proposed to overcome scalable learning algorithms for networks of spiking neurons in silicon by combining innovations in computation, memory, and communication to leverage robust digital neuron circuits and novel transposable SRAM arrays.
Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
This work systematically explore the trade-offs of hardware cost by searching the design variable configurations, and proposes a specific dataflow of hardware CNN acceleration to minimize the memory access and data movement while maximizing the resource utilization to achieve high performance.
XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks
- Xiaoyu Sun, Shihui Yin, Xiaochen Peng, R. Liu, Jae-sun Seo, Shimeng Yu
- Computer ScienceDesign, Automation & Test in Europe Conference…
- 19 March 2018
This work proposes a RRAM synaptic architecture with a bit-cell design of complementary word lines that implements equivalent XNOR and bit-counting operation in a parallel fashion and investigates the impact of sensing offsets on classification accuracy and analyzes various design options with different sub-array sizes and sensing bit-levels.
Mitigating effects of non-ideal synaptic device characteristics for on-chip learning
- Pai-Yu Chen, Binbin Lin, Shimeng Yu
- Computer ScienceIEEE/ACM International Conference on Computer…
- 2 November 2015
This study shows that the recognition accuracy of MNIST handwriting digits degrades from ~97 % to ~65 %, and proposes the mitigation strategies, which include the smart programming schemes for achieving linear weight update, a dummy column to eliminate the off-state current, and the use of multiple cells for each weight element to alleviate the impact of device variations.
An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks
- Yufei Ma, Yu Cao, S. Vrudhula, Jae-sun Seo
- Computer Science27th International Conference on Field…
- 1 September 2017
This work presents an RTL-level CNN compiler that automatically generates customized FPGA hardware for the inference tasks of various CNNs, in order to enable high-level fast prototyping of CNNs from software to FPGAs and still keep the benefits of low-level hardware optimization.
XNOR-SRAM: In-Memory Computing SRAM Macro for Binary/Ternary Deep Neural Networks
- Zhewei Jiang, Shihui Yin, Mingoo Seok, Jae-sun Seo
- Computer ScienceIEEE Symposium on VLSI Technology
- 1 June 2018
We present an in-memory computing SRAM macro that computes XNOR-and-accumulate in binary/ternary deep neural networks on the bitline without row-by-row data access. It achieves 33X better energy and…
Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA
- Yufei Ma, Yu Cao, S. Vrudhula, Jae-sun Seo
- Computer ScienceIEEE Transactions on Very Large Scale Integration…
- 3 April 2018
This paper quantitatively analyzing and optimizing the design objectives of the CNN accelerator based on multiple design variables and proposes a specific dataflow of hardware CNN acceleration to minimize the data communication while maximizing the resource utilization to achieve high performance.
Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA
- Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, S. Vrudhula
- Computer Science26th International Conference on Field…
- 1 August 2016
This work quantitatively analyzes the complier's design strategy to optimize the throughput of a given CNN model with the FPGA resource constraints, and demonstrates the promise of the automatic compiler solution for modularized and scalable hardware acceleration of deep learning.
Specifications of Nanoscale Devices and Circuits for Neuromorphic Computational Systems
It is shown that neuromorphic systems based on new nanoscale devices can potentially improve density and power consumption by at least a factor of 10, as compared with conventional CMOS implementations.