Ashwini K. Nanda

Learn More
Modern system design often requires multiple levels of simulation for design validation and performance debugging. However, while machines have gotten faster, and simulators have become more detailed, simulation speeds have not tracked machine speeds. As a result, it is difficult to simulate realistic problem sizes and hardware configurations for a target(More)
Performance of Web servers is critical to the success of many corporations and organizations. However, very few results have been published that quantitatively study the server behavior and identify the performance bottlenecks. In this paper we measure and analyze the behavior of the popular Apache Web server on a uniprocessor system and a 4-CPU SMP(More)
Performance of Web servers is critical to the success of many corporations and organizations. However, very few results have been published that quantitatively study the server behavior and identify the performance bottlenecks. In this paper we measure and analyze the behavior of the popular Apache Web server on a uniprocessor system and a 4-CPU SMP(More)
Recent research shows that the occupancy of the coherence controllers is a major performance bottleneck for distributed cache coherent shared memory multiprocessors. A significant part of the occupancy is due to the latency of accessing the directory, which is usually kept in DRAM memory. Most coherence controller designs that use protocol processors for(More)
Scalable distributed shared-memory architectures rely on coherence controllers on each processing node to synthesize cache-coherent shared memory across the entire machine. The coherence controllers execute coherence protocol handlers that may be hardwired in custom hardware or programmed in a protocol processor within each coherence controller. Although(More)
Trace–driven simulation continues to be one of the main evaluation methods in the design of high performance processor–memory sub–systems. In this paper, we examine the varying speed–up opportunities available by processing a given trace in parallel on an IBM SP–2 machine. We also develop a simple, yet effective method of correcting for cold–start cache(More)
Recent research shows that the occupancy of the coherence controllers is a major performance bottleneck for distributed cache coherent shared memory multiprocessors. In this paper we study three approaches to alleviating this problem in hardwired coherence controllers, namely, multiple protocol engines, pipelined protocol engines, and split request-response(More)