• Publications
  • Influence
Decimal Floating-Point Multiplication
TLDR
This paper presents the design of two decimal floating-point multipliers: one whose partial product accumulation strategy employs decimal carry- save addition and one that employs binary carry-save addition. Expand
Decimal Floating-Point Multiplication Via Carry-Save Addition
TLDR
This paper presents the design of a decimal floating-point multiplier that complies with specifications for decimal multiplication given in the draft revision of the IEEE 754 standard for floating point arithmetic (IEEE 754R). Expand
A parallel IEEE P754 decimal floating-point multiplier
TLDR
This paper presents a fully parallel decimal floating-point multiplier compliant with the recent draft of the IEEE P754 Standard for Floating-point Arithmetic (IEEE P754). Expand
Improved combined binary/decimal fixed-point multipliers
TLDR
This paper presents several combined binary/decimal fixed-point multipliers that use the BCD-4221 recoding for the decimal digits. Expand
Experimental Analysis of Matrix Multiplication Functional Units
TLDR
This paper describes a method exploiting the rounding modes and other features of the IEEE 754 standard in order to gain deeper insight into the design and functionality of matrix multiplication units and report our findings on the design properties and micro-architecture. Expand
A Combined Decimal and Binary Floating-Point Multiplier
TLDR
In this paper, we describe the first hardware design of a combined binary and decimal floating-point multiplier, based on specifications in the IEEE 754-2008 Floating-point Standard. Expand
Intel Nervana Neural Network Processor-T (NNP-T) Fused Floating Point Many-Term Dot Product
TLDR
In this paper, we describe the details of the MPU pipeline, discuss the trade-offs made in the design, and present information on the accuracy of the computation as compared to traditional FMA implementations. Expand
Vector mask-controlled clock gating for Leistungseffizenz a processor
TLDR
A processor includes an instruction schedule and dispatch (planning / shipping) unit for receiving a single-instruction-multiple-Data (SIMD) instruction to perform an operation on many data items displayed in a first one of a source operand location is stored. Expand
Joining adjacent gather / scatter operations
TLDR
A processor includes an instruction decoder for decoding a first instruction for collecting data elements from a memory, wherein the first instruction on a first operand includes specifying a first location, and specifying a second operand, a first memory address storing a plurality of data elements. Expand
eNuRAPID-A Leakage Power and Wire Latency Aware Cache Design
Power dissipation is becoming an increasingly important factor in the design of modern CPUs ranging from those intended for mobile use up to high-performance server processors. On-chip cachesExpand
...
1
2
...