• Publications
  • Influence
Piranha: a scalable architecture based on single-chip multiprocessing
This paper describes the Piranha system, a research prototype being developed at Compaq that aggressively exploits chip multiprocessing by integrating eight simple Alpha processor cores along with aExpand
SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture
TLDR
The novelty of SIMFLEX lies in its combination of a unique, compile-time approach to component interconnection and a methodology for obtaining accurate results from sampled simulations on a platform capable of evaluating unmodified commercial workloads. Expand
PHAST: Hardware-Accelerated Shortest Path Trees
TLDR
A novel algorithm to solve the nonnegative single-source shortest path problem on road networks and other graphs with low highway dimension that needs fewer operations, has better locality, and is better able to exploit parallelism at multi-core and instruction levels. Expand
Missing the Memory Wall: The Case for Processor/Memory Integration
TLDR
It is shown that processor memory integration can be used to build competitive, scalable and cost-effective MP systems and results from execution driven uni- and multi-processor simulations show that the benefits of lower latency and higher bandwidth can compensate for the restrictions on the size and complexity of the integrated processor. Expand
PHAST: Hardware-accelerated shortest path trees
TLDR
A novel algorithm to solve the non-negative single-source shortest path problem on road networks and graphs with low highway dimension that needs fewer operations, has better locality, and is better able to exploit parallelism at multi-core and instruction levels. Expand
The S3.mp scalable shared memory multiprocessor
S3.mp (Sun's Scalable Shared memory MultiProcessor) is a research project to demonstrate a low overhead, high throughput communication system that is based on cache coherent distributed shared memoryExpand
S-Connect: from networks of workstations to supercomputer performance
TLDR
The first version of the S-Connect switching element has been successfully, implemented in a commercial 0.65 /spl mu/m CMOS process. Expand
Impact of chip-level integration on performance of OLTP workloads
TLDR
The design trade-offs that arise as more system functionality is integrated onto the processor chip are examined, and a number of important architectural choices that are influenced by chip-level integration are identified. Expand
Verifying Distributed Directory-Based Cahce Coherence Protocols: S3.mp, a Case Study
TLDR
This paper presents the results for the verification of the S3.mp cache coherence protocol and finds several design errors, including an error which only appears in verification models of more than three processing nodes, which is very unlikely to be detected by intensive simulations. Expand
...
1
2
3
4
...