A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference

@article{Fleischer2018ASM,
  title={A Scalable Multi- TeraOPS Deep Learning Processor Core for AI Trainina and Inference},
  author={Bruce M. Fleischer and Sunil Shukla and Matthew M. Ziegler and Joel Silberman and Jinwook Oh and Vijayalakshmi Srinivasan and Jungwook Choi and Silvia Mueller and Ankur Agrawal and Tina Babinsky and Nianzheng Cao and Chia-Yu Chen and Pierce Chuang and Thomas W. Fox and George Gristede and Michael Guillorn and Howard Haynie and Michael Klaiber and Dongsoo Lee and Shih-Hsien Lo and Gary W. Maier and Michael Scheuermann and Swagath Venkataramani and Christos Vezyrtzis and Naigang Wang and Fanchieh Yee and Ching Zhou and Pong-Fei Lu and Brian W. Curran and Leland Chang and Kailash Gopalakrishnan},
  journal={2018 IEEE Symposium on VLSI Circuits},
  year={2018},
  pages={35-36}
}
A multi-TOPS AI core is presented for acceleration of deep learning training and inference in systems from edge devices to data centers. With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a dataflow architecture and an on-chip scratchpad hierarchy. Compute precision is optimized at 16b floating point (fp 16) for high model accuracy in training and inference as well as 1b/2b (bi-nary/ternary… CONTINUE READING

Similar Papers

Tables, Results, and Topics from this paper.

Key Quantitative Results

  • With a programmable architecture and custom ISA, this engine achieves >90% sustained utilization across the range of neural network topologies by employing a dataflow architecture and an on-chip scratchpad hierarchy.
  • This work uses a dataflow architecture, which is particularly effective for deep learning workloads, and an on-chip scratchpad hierarchy to achieve >90% sustained utilization across a broad range of neural network topologies.

Citations

Publications citing this paper.
SHOWING 1-9 OF 9 CITATIONS

A Compiler for Deep Neural Network Accelerators to Generate Optimized Code for a Wide Range of Data Parameters from a Hand-crafted Computation Kernel

  • 2019 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)
  • 2019
VIEW 4 EXCERPTS
CITES METHODS & BACKGROUND
HIGHLY INFLUENCED

Future Computing Hardware for AI

  • 2018 IEEE International Electron Devices Meeting (IEDM)
  • 2018
VIEW 3 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

Analog-to-Digital Conversion With Reconfigurable Function Mapping for Neural Networks Activation Function Acceleration

  • IEEE Journal on Emerging and Selected Topics in Circuits and Systems
  • 2019
VIEW 1 EXCERPT
CITES BACKGROUND

References

Publications referenced by this paper.

Test chip floorplan and dimensions Figure 10: Relative power vs. FPU utilization 2018 Symposium on VLSI Circuits Digest of Technical Papers

N. P. Jouppi, ISCA
  • 2018
VIEW 3 EXCERPTS
HIGHLY INFLUENTIAL