Learn More
Embedded elements, such as block multipliers, are increasingly used in advanced field programmable gate array (FPGA) devices to improve efficiency in speed, area and power consumption. A methodology is described for assessing the impact of such embedded elements on efficiency. The methodology involves creating dummy elements, called virtual embedded blocks(More)
This paper presents an architecture for a reconfigurable device that is specifically optimized for floating-point applications. Fine-grained units are used for implementing control logic and bit-oriented operations, while parameterized and reconfigurable word-based coarse-grained units incorporating word-oriented lookup tables and floating-point operations(More)
This paper describes a novel hardware accelerator for Monte Carlo (MC) simulation, and illustrate its implementation in field programmable gate array (FPGA) technology for speeding up financial applications. Our accelerator is based on a generic architecture, which combines speed and flexibility by integrating a pipelined MC core with an on-chip instruction(More)
A hybrid FPGA consists of island-style fine-grained units and domain-specific coarse-grained units. This paper describes an approach to estimate the power consumption of a set of hybrid FPGA architectures. The dynamic power consumption of the fine-grained units is obtained using standard FPGA tools, and the coarse-grained units using standard ASIC tools.(More)
This paper presents a novel architecture for domain-specific FPGA devices. This architecture can be optimised for both speed and density by exploiting domain-specific information to produce efficient reconfigurable logic with multiple granularity. In the reconfigurable logic, general-purpose fine-grained units are used for implementing control logic and(More)
A module generator is described that allows for the generation of synthesizable VHDL modules which implement arbitrary functions in fixed point precision using the Symmetric Table Addition Method (STAM). This module generator was interfaced to a high level synthesis tool “fly” which automatically generates fully-pipelined circuits from a Perl-like language.(More)
A technique for parallelising multiple loops in a heterogeneous computing system is presented. Loops are first unrolled and then broken up into multiple tasks which are mapped to reconfigurable hardware. A performance-driven optimisation is applied to find the best unrolling factor for each loop under hardware size constraints. The approach is demonstrated(More)
We present an architecture for a synthesizable datapath-oriented FPGA core that can be used to provide post-fabrication flexibility to an SoC. Our architecture is optimized for bus-based operations and employs a directional routing architecture, which allows it to be synthesized using standard ASIC design tools and flows. The primary motivation for this(More)
We present an architecture for a synthesizable datapath-oriented Field Programmable Gate Array (FPGA) core which can be used to provide post-fabrication flexibility to a System-on-Chip (SoC). Our architecture is optimized for bus-based operations that are common in signal processing and computation intensive applications. It employs a directional routing(More)