• Publications
  • Influence
Improving Superscalar Instruction Dispatch And Issue By Exploiting Dynamic Code Sequences
Superscalar processors currently have the potential to fetch multiple basic blocks per cycle by employing one of several recently proposed instruction fetch mechanisms. However, this increased fetchExpand
  • 106
  • 5
Instruction Issue Logic for High-Performance, Interruptable Pipelined Processors
The performance of pipelined processors is severely limited by data dependencies. In order to achieve high performance, a mechanism to alleviate the effects of data dependencies must exist. If aExpand
  • 58
  • 5
Trace Processors: Moving to Fourth-Generation Microarchitectures
This article proposes a new architecture called "trace processors", which consist of multiple, distributed on-chip processor cores, each of which simultaneously executes a different trace. All butExpand
  • 60
  • 4
Tradeoffs in instruction format design for horizontal architectures
With recent improvements in software techniques and the enhanced level of fine grain parallelism made available by such techniques, there has been an increased interest in horizontal architecturesExpand
  • 14
  • 2
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs
Several ILP limit studies indicate the presence of considerable ILP across dynamically far-apart instructions in program execution. This paper proposes a hardware mechanism, dynamic vectorizationExpand
  • 15
  • 1
Load balancing in a heterogeneous computing environment
Heterogeneous distributed computing is the tuned use of a network of machines of diverse architectures and computational power; by directing individual portions of a parallel program to theExpand
  • 14
  • 1
Instruction-level characterization of the Cray Y-MP processor
Evolutionary computer architecture design fundamentally relies on empirical knowledge of workload characteristics and of dynamic program usage of machine features. While vector machines have thus farExpand
  • 6
  • 1
On the instruction-level characteristics of scalar code in highly-vectorized scientific applications
The pe$ormance of a program will ultimately be limited by its serial (scalar) portion, as pointed out by Amdahl’s Law. So far, reported studies of instruction-level parallelism have mixedExpand
  • 4
  • 1
CDE : A Compiler-driven , Dependence-Centric , Eager-executing Architecture for the Billion Transistors Era
We propose an evolutionary new approach to high-performance processor architectures that can scale to use increasing numbers of transistors while meeting the constraints of design complexity, wireExpand
  • 4
  • 1
Instruction issue logic for high-performance, interruptable pipelined processors
The performance of pipelined processors is severely limited by data dependencies. In order to achieve high performance, a mechanism to alleviate the effects of data dependencies must exist. If aExpand
  • 47