Learn More
To achieve speedup for multi-node, multi-GPU computing platforms, it is necessary to overcome performance bottlenecks in networks based on Ethernet or Infiniband. This paper describes an FPGA implementation of a custom network interface for an optical link between PCIe buses of compute nodes. The implementation uses an Altera Stratix IV chip with integrated(More)
FPGAs have the advantage that a single component can be configured post-fabrication to implement almost any computation. However, designing a one-size-fits-all memory architecture causes an inherent mismatch between the needs of the application and the memory sizes and placement on the architecture. Nonetheless, we show that an energy-balanced design for(More)
The energy in FPGA computations can be dominated by data communication energy, either in the form of memory references or data movement on interconnect (e.g., over 75% of energy for single processor Gaussian Mixture Modeling, Window Filtering, and FFT). In this paper, we explore how to use data placement and parallelism to reduce communication energy. We(More)
Hardware interlocks that enforce semantic invariants and allow fine-grained privilege separation can be built with reasonable costs given modern semiconductor technology. In the common error-free case, these mechanisms operate largely in parallel with the intended computation, monitoring the semantic intent of the computation on an operation-by-operation(More)
As technology feature sizes shrink, aggressive voltage scaling is required to contain power density. However, this also increases the rate of transient upsets-potentially preventing us from scaling down voltage and possibly even requiring voltage increases to maintain reliability. Duplication with checking and triple-modular redundancy are traditional(More)
We propose three methods for reducing power consumption in high-performance FPGAs (field programmable gate arrays). We show that by using continuous hierarchy memory, lightweight checks, and lower chip voltage for near-threshold voltage computation, we can both reduce power consumption and increase reliability without a decrease in throughput. We have(More)
The energy in FPGA computations is dominated by data communication energy, either in the form of memory references or data movement on interconnect. In this article, we explore how to use data placement and parallelism to reduce communication energy. We show that parallelism can reduce energy and that the optimal level of parallelism increases with the(More)
  • 1