Learn More
It is a common belief that computer performance growth is over 50% annually, or that performance doubles every 18-20 months. By analyzing publicly available results from the SPEC integer (CINT) benchmark suites, we conclude that this was true between 1985 and 1996 -- the early years of the RISC paradigm.During the last 7.5 years (1996-2004), however,(More)
System level simulators allow computer architects and system software designers to recreate an accurate and complete replica of the program behavior of a target system, regardless of the availability, existence, or in-strumentation support of such a system. Applications include evaluation of architectural design alternatives as well as software engineering(More)
Two-level coherence predictors have shown great promise to reduce coherence overhead in shared memory multipro-cessors. However, to be accurate they require a memory overhead that on e.g. a 64-processor machine can be as high as 50%. Based on an application case study consisting of seven applications from SPLASH-2, a first observation made in this paper is(More)
We present a novel architecture for flexible control of shading rates in a GPU pipeline, and demonstrate substantially reduced shading costs for various applications. We decouple shading and visibility by restricting and quantizing shading rates to a finite set of screen-aligned grids, leading to simpler and fewer changes to the GPU pipeline compared to(More)
On-line transaction processing exhibits poor memory behavior in high-end multiprocessor servers because of complex sharing patterns and substantial interaction between the database server and the operating system. One contributing source is a large amount of load-store sequences in the program, resulting in many read misses as well as much global(More)
We propose a powerful hardware architecture for pixel shading, which enables flexible control of shading rates and automatic shading reuse between triangles in tessellated primitives. The main goal is efficient pixel shading for moderately to finely tessellated geometry, which is not handled well by current GPUs. Our method effectively decouples the cost of(More)
Parallel programs that modify shared data in a cache-coherent multiprocessor with a write-invalidate coherence protocol create ownership overhead in the form of ownership acquisitions at writes to shared data. This can have a significant impact on performance in a cache-coherent non-uniform memory architecture (NUMA) multiprocessor. By combining a(More)
Instruction-level simulation techniques are the predominant approach to evaluate the impact of architectural design alternatives on the performance of computer systems. Previous simulation approaches have not been capable of executing unmodified system as well as application software at an acceptable performance level. Commercial applications, such as(More)
This paper assumes the availability of a very fast higher-dimensional rasterizer in future graphics processors. Working in up to five dimensions, i.e., adding time and lens parameters, it is well-known that this can be used to render scenes with both motion blur and depth of field. Our hypothesis is that such a rasterizer can also be used as a flexible tool(More)
In this paper a concept for Virtual Machine Vision is proposed using a commercial Computer Aided Robotics software called RobCad. The system utilizes ideal virtual cameras and lights for the simulation of a real vision system. Sensory data is sent to a vision software for data analysis. The Virtual Machine Vision together with the simulation model can be(More)