Jim Nilsson

Learn More
System level simulators allow computer architects and system software designers to recreate an accurate and complete replica of the program behavior of a target system, regardless of the availability, existence, or in-strumentation support of such a system. Applications include evaluation of architectural design alternatives as well as software engineering(More)
It is a common belief that computer performance growth is over 50% annually, or that performance doubles every 18-20 months. By analyzing publicly available results from the SPEC integer (CINT) benchmark suites, we conclude that this was true between 1985 and 1996 -- the early years of the RISC paradigm.During the last 7.5 years (1996-2004), however,(More)
Two-level coherence predictors have shown great promise to reduce coherence overhead in shared memory multipro-cessors. However, to be accurate they require a memory overhead that on e.g. a 64-processor machine can be as high as 50%. Based on an application case study consisting of seven applications from SPLASH-2, a first observation made in this paper is(More)
We propose a powerful hardware architecture for pixel shading, which enables flexible control of shading rates and automatic shading reuse between triangles in tessellated primitives. The main goal is efficient pixel shading for moderately to finely tessellated geometry, which is not handled well by current GPUs. Our method effectively decouples the cost of(More)
Parallel programs that modify shared data in a cache-coherent multiprocessor with a write-invalidate coherence protocol create ownership overhead in the form of ownership acquisitions at writes to shared data. This can have a significant impact on performance in a cache-coherent non-uniform memory architecture (NUMA) multiprocessor. By combining a(More)
Instruction-level simulation techniques are the predominant approach to evaluate the impact of architectural design alternatives on the performance of computer systems. Previous simulation approaches have not been capable of executing unmodified system as well as application software at an acceptable performance level. Commercial applications, such as(More)
On-line transaction processing exhibits poor memory behavior in high-end multiprocessor servers because of complex sharing patterns and substantial interaction between the database server and the operating system. One contributing source is a large amount of load-store sequences in the program, resulting in many read misses as well as much global(More)
Many current studies often use scientific and engineer applications as benchmarks to study shared-memory multiprocessors. This, however, contrast sharply with the reality where most high-end systems only run commercial applications such as database engine and web servers. Despite the pressing need of understanding these applications, the progress has been(More)