• Publications
  • Influence
Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks
This work implements a CNN accelerator on a VC707 FPGA board and compares it to previous approaches, achieving a peak performance of 61.62 GFLOPS under 100MHz working frequency, which outperform previous approaches significantly. Expand
High-Level Synthesis for FPGAs: From Prototyping to Deployment
AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx are used as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Expand
FlowMap: an optimal technology mapping algorithm for delay optimization in lookup-table based FPGA designs
  • J. Cong, Y. Ding
  • Computer Science
  • IEEE Trans. Comput. Aided Des. Integr. Circuits…
  • 8 November 1992
A theoretical breakthrough is presented which shows that the LUT-based FPGA technology mapping problem for depth minimization can be solved optimally in polynomial time. Expand
A thermal-driven floorplanning algorithm for 3D ICs
As the technology progresses, interconnect delays have become bottlenecks of chip performance. 3D integrated circuits are proposed as one way to address this problem. However, thermal problem is aExpand
Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks
This paper design and implement Caffeine, a hardware/software co-designed library to efficiently accelerate the entire CNN on FPGAs with a key focus on the bandwidth optimization by the memory access reorganization not studied in prior work. Expand
Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs
This paper implements CNN on an FPGA using a systolic array architecture, which can achieve high clock frequency under high resource utilization, and provides an analytical model for performance and resource utilization and develops an automatic design space exploration framework. Expand
CMP network-on-chip overlaid with multi-band RF-interconnect
  • M. Chang, J. Cong, +4 authors S. Tam
  • Computer Science
  • IEEE 14th International Symposium on High…
  • 24 October 2008
In this paper, we explore the use of multi-band radio frequency interconnect (or RF-I) with signal propagation at the speed of light to provide shortcuts in a many core network-on-chip (NoC) meshExpand
A scalable micro wireless interconnect structure for CMPs
This paper proposes a recursive wireless interconnect structure called the WCube that features a single transmit antenna and multiple receive antennas at each micro wireless router and offers scalable performance in terms of latency and connectivity. Expand
Bounded-skew clock and Steiner routing
This work studies the minimum-cost bounded-skew routing tree problem under the pathlength (linear) and Elmore delay models and proposes a new Greedy-BST/DME algorithm which combines the merging region computation with topology generation. Expand
Thermal via planning for 3-D ICs
  • J. Cong, Yan Zhang
  • Engineering, Computer Science
  • ICCAD-. IEEE/ACM International Conference on…
  • 31 May 2005
This paper forms the TTS-via minimization problem with temperature constraints as a constrained nonlinear programming problem (NLP) based on the thermal resistive model and develops an efficient heuristic algorithm, named m-ADVP, which solves a sequence of simplified via planning subproblems in alternating direction in a multilevel framework. Expand