• Publications
  • Influence
From opencl to high-performance hardware on FPGAS
TLDR
We present an OpenCL compilation framework to generate high-performance hardware for FPGAs, and present the throughput and area results for each application. Expand
  • 223
  • 28
An OpenCL™ Deep Learning Accelerator on Arria 10
TLDR
We show a novel architecture written in OpenCL(TM), which we refer to as a Deep Learning Accelerator (DLA), that maximizes data reuse and minimizes external memory bandwidth. Expand
  • 125
  • 16
  • PDF
An OpenCL(TM) Deep Learning Accelerator on Arria 10
TLDR
We show a novel architecture written in OpenCL(TM), which we refer to as a Deep Learning Accelerator (DLA), that maximizes data reuse and minimizes external memory bandwidth to significantly boost the performance of FPGA. Expand
  • 70
  • 16
Serializability of Transactions in Software Transactional Memory
The use of two-phase locking (2PL) to enforce serialization in today’s Software Transactional Memory (STM) systems leads to poor performance for programs with long-running transactions andExpand
  • 51
  • 7
  • PDF
Hardware Support for Relaxed Concurrency Control in Transactional Memory
TLDR
In this paper, we discuss how a relaxed concurrency control algorithm can be efficiently implemented in hardware on top a base hardware transactional memory system that provides support for isolation and conflict detection. Expand
  • 26
  • 4
OpenCL for FPGAs: Prototyping a Compiler
TLDR
We present a framework to support OpenCL compilation to FPGAs and present the results on a set of benchmark applications. Expand
  • 36
  • 3
  • PDF
Relaxed Concurrency Control in Software Transactional Memory
TLDR
We propose the use of a more relaxed concurrency control algorithm based on conflict-serializability (CS) to provide better concurrency. Expand
  • 15
  • 2
St Journal of Research -volume 1 -number 2 -processor Architecture and Compilation for Embedded Systems 4 Copyright © Ieee, 2004 -reprinted, with Permission, from a Multi-level Compauting
4 COPYRIGHT © IEEE, 2004 REPRINTED, WITH PERMISSION, FROM A MULTI-LEVEL COMPAUTING ARCHITECTURE FOR EMBEDDED MULTIMEDIA APPLICATIONS, BY FARAYDON KARIM, ALAIN MELLAN, ANH NGUYEN (STMICROELECTRONICS),Expand
  • 6
  • 2
  • PDF
A multilevel computing architecture for embedded multimedia applications
TLDR
We have designed a new architecture that simplifies integration of heterogeneous IP for multimedia and streaming applications using superscalar techniques to exploit task-level parallelism among different processing units. Expand
  • 43
  • 1
In-Package Domain-Specific ASICs for Intel® Stratix® 10 FPGAs: A Case Study of Accelerating Deep Learning Using TensorTile ASIC
TLDR
We propose TensorTile ASICs for Stratix10 FPGAs to provide ASIC-level tensor performance, while relying on FPGA's flexibility for application-specific operations. Expand
  • 12
  • 1
  • PDF