Edward S. Davidson

Learn More
<i>Data cache misses reduce the performance of wide-issue processors by stalling the data supply to the processor. Prefetching data by predicting the miss address is one way to tolerate the cache miss latencies. But current applications with irregular access patterns make it difficult to accurately predict the address sufficiently early to mask large cache(More)
Direct-mapped caches are often plagued by conflict misses because they lack the associativity to store more than one memory block in each set. However, some blocks that have no temporal locality actually cause program execution degradation by displacing blocks that do manifest temporal behavior. In this paper, we present a simple but efficient novel(More)
High speed scalar processing is an essential characteristic of high performance general purpose computer systems. Highly concurrent execution of scalar code is difficult due to data dependencies and conditional branches. This paper proposes an architectural concept called <italic>guarded instructions</italic> to reduce the penalty of conditional branches in(More)
Modu[o scheduhng is an eficient technique for exploiting instruction level paralleh’sm in a vam”ety of loops, resulting in high performance code but increased register requirements. We present a set of low computational complexity stage-scheduling heuristics that reduce the register requirements of a given modu!o schedule by shafting operations by multiples(More)
In this paper, we give an overview of the Cedar multiprocessor and present recent performance results. These include the performance of some computational kernels and the Perfect Benchmarks. We also present a methodology for judging parallel system performance and apply this methodology to Cedar, Cray YMP-8, and Thinking Machines CM-5.
The diierence in processor and main memory cycle time has necessitated the use of aggressive prefetching techniques to reduce or hide main memory access latency. However, prefetching can signiicantly increase memory bandwidth and unsuccessful prefetches may even pollute the primary cache. Although the current metrics, coverage and accuracy, do provide an(More)
More and more scientists and engineers are becoming interested in using supercomputers. Earlier barriers to using these machines are disappearing as software for their use improves. Meanwhile, new parallel supercomputer architectures are emerging that may provide rapid growth in performance. These systems may use a large number of processors with an(More)
Highly aggressive multi-issue processor designs of the past few years and projections for the decade, require that we redesign the operation of the cache memory system. The number of instructions that must be processed (including incorrectly predicted ones) will approach 16 or more per cycle. Since memory operations account for about a third of all(More)
A pipeline is defined to be a collection of resources, called <underline>segments</underline> which can be kept busy simultaneously. A task once initiated, flows from segment to segment for its execution. A <underline>collision</underline> occurs if two or more tasks attempt to use the same segment at the same time. The collision characteristics of a(More)
With the continuing technological trend of ever cheaper and larger memory, most data sets in database servers will soon be able to reside in main memory. In this configuration, the performance bottleneck is likely to be the gap between the processing speed of the CPU and the memory access latency. Previous work has shown that database applications have(More)