• Corpus ID: 56887657

Design and implementation of an out-of-order execution engine of floating-point arithmetic operations

  title={Design and implementation of an out-of-order execution engine of floating-point arithmetic operations},
  author={Crist{\'o}bal Ram{\'i}rez Lazo},
In this thesis, work is undertaken towards the design in hardware description languages and implementation in FPGA of an out of order execution engine of floating point arithmetic operations. This thesis work, is part of a project called Lagarto. 



Efficient Implementation of IEEE Double Precision Floating-Point Multiplier on FPGA

  • M. JaiswalN. Chandrachoodan
  • Computer Science
    2008 IEEE Region 10 and the Third international Conference on Industrial and Information Systems
  • 2008
Comparisons against the best reported multipliers in the literature show that the proposed module can outperform them, and gives excellent performance with efficient use of resources.

Analysis of high-performance floating-point arithmetic on FPGAs

The impact of floating-point units on the design of an energy efficient architecture for the matrix multiply kernel is discussed and it is shown that FPGAs are capable of achieving up to 6x improvement in terms of the GFLOPS/W metric over that of general purpose processors.

High Performance FPGA Implementation of Double Precision Floating Point Adder/Subtractor

The proposed design has optimized the individual complex components of adder module (like dynamic shifter, leading one detector (LOD), priority encoder), to achieve the better overall implementation.

Analysis and Implementation of a Novel Leading Zero Anticipation Algorithm for Floating-Point Arithmetic Units

A novel LZA algorithm is investigated allowing us to remove error correction circuitry by reducing the error rate below a commonly accepted limit for image processing applications, which is not achieved by previous techniques.

A Study on the Floating-Point Adder in FPGAS

  • A. MalikS. Ko
  • Computer Science
    2006 Canadian Conference on Electrical and Computer Engineering
  • 2006
This research was oriented towards studying and implementing standard, LOP, and far and close data-path floating-point addition algorithms, each of which has complex sub-operations which lead significantly to overall latency of the design.

Design and implementation of reciprocal unit

The presented design utilizes a 27 times 16 bits ROM followed by two Newton-Raphson iterations to achieve the 52-bit accuracy approximation of the reciprocal of a double precision floating-point number.

Low-Power Leading-Zero Counting and Anticipation Logic for High-Speed Floating Point Units

New boolean relations for the bits of the leading-zero count are derived that allow their computation to be performed using standard carry-lookahead techniques.

Design and implementation of reciprocal unit using table look-up and Newton-Raphson iteration

The design and implementation of reciprocal unit is presented, which computes the reciprocal of double precision of floating-point number in eleven clock cycles using a 2/sup 10/ /spl times/ 20 bits ROM followed by two Newton-Raphson iterations.

A partitioned instruction queue to reduce instruction wakeup energy

This paper proposes a new mechanism for instruction wakeup, which uses a partitioned instruction queue (IQ) that is shown to require as little as 1.5 comparisons per committed instruction for SPEC2000 benchmarks.

Multiple-banked register file architectures

This paper proposes a register file architecture composed of multiple banks, which provides low latency and simple bypass logic and shows that a two-level organization degrades IPC and increases performance by 87% and 92% when the register file access time is factored in.