Hwa C. Torng

Learn More
Processors with multiple functional units, such as CRAY-1, Cyber 205, and FPS 164, have been used for high-end scientific computation tasks. Much effort has been put into increasing the throughput of such systems. One critical consideration in their design is the identification and implementation of a suitable instruction issuing scheme. Existing approaches(More)
This paper presents the design of a fast inner product processor, with appreciably reduced latency and cost The inner product processor is implemented with a tree of carry propagate or carry save adders; this tree is obtained with the incorporation of three innovations in the conventional multiply/add tree: (1) The leaf-multipliers are expanded into adder(More)
The achievement of fast, precise interrupts and the implementation of multiple levels of branch predictions are two of the problems associated with the dynamic scheduling of instructions for superscalar processors. Their solution is especially difficult if short cycle time operation is desired. We present solutions to these problems through the development(More)
Inner product computation is an important operation, invoked repeatedly in matrix multiplications. A high-speed inner product processor can be. very useful (among many possible applications) in real-time signal processing. This paper presents the design of a fast inner product processor, with appreciably reduced latency and cost. The inner product processor(More)