• Publications
  • Influence
FPGAs vs. CPUs: trends in peak floating-point performance
TLDR
Moore's Law states that the number of transistors on a device doubles every two years; however, it is often (mis)quoted based on its impact on CPU performance. Expand
Closing the gap: CPU and FPGA trends in sustainable floating-point BLAS performance
TLDR
This paper examines three of the basic linear algebra subroutine (BLAS) functions: vector dot product, matrix-vector multiply, and matrix multiply. Expand
Remote Memory Access Programming in MPI-3
TLDR
The Message Passing Interface (MPI) 3.0 standard, introduced in September 2012, includes a significant update to the one-sided communication interface, also known as remote memory access (RMA). Expand
Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics
TLDR
The Intel® Omni-Path Architecture (Intel® OPA) is designed to enable a broad class of computations requiring scalable, tightly coupled CPU, memory, and storage resources. Expand
RC-BLAST: towards a portable, cost-effective open source hardware implementation
TLDR
This paper describes the implementation of an FPGA-based hardware implementation designed to accelerate the BLAST algorithm. Expand
SeaStar Interconnect: Balanced Bandwidth for Scalable Performance
TLDR
The Seastar, a new ASIC from Cray, is a full system-on-chip design that integrates high-speed serial links, a 3D router, and traditional network interface functionality, including an embedded processor in a single chip. Expand
The Portals 4.0 Network Programming Interface
TLDR
This report presents a specification for the Portals 4.0 network programming interface, which is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Expand
A comparison of floating point and logarithmic number systems for FPGAs
TLDR
There have been many papers proposing the use of logarithmic numbers (LNS) as an alternative to floating point because of simpler multiplication, division and exponentiation computations. Expand
Architectural Modifications to Enhance the Floating-Point Performance of FPGAs
TLDR
This paper considers three architectural modifications that make floating-point operations more efficient on FPGAs. Expand
Embedded floating-point units in FPGAs
TLDR
In this paper, we introduce embedding floating-point multiply-add units in an island style FPGA. Expand
...
1
2
3
4
5
...