Ashwini K. Nanda

Learn More
Modern system design often requires multiple levels of simulation for design validation and performance debugging. However, while machines have gotten faster, and simulators have become more detailed, simulation speeds have not tracked machine speeds. As a result, it is difficult to simulate realistic problem sizes and hardware configurations for a target(More)
Performance of Web servers is critical to the success of many corporations and organizations. However, very few results have been published that quantitatively study the server behavior and identify the performance bottlenecks. In this paper we measure and analyze the behavior of the popular Apache Web server on a uniprocessor system and a 4-CPU SMP(More)
Recent research shows that the occupancy of the coherence controllers is a major performance bottleneck for distributed cache coherent shared memory multiprocessors. A significant part of the occupancy is due to the latency of accessing the directory, which is usually kept in DRAM memory. Most coherence controller designs that use protocol processors for(More)
Trace–driven simulation continues to be one of the main evaluation methods in the design of high performance processor–memory sub–systems. In this paper, we examine the varying speed–up opportunities available by processing a given trace in parallel on an IBM SP–2 machine. We also develop a simple, yet effective method of correcting for cold–start cache(More)
Recent research shows that the occupancy of the coherence controllers is a major performance bottleneck for distributed cache coherent shared memory multiprocessors. In this paper we study three approaches to alleviating this problem in hardwired coherence controllers, namely, multiple protocol engines, pipelined protocol engines, and split request-response(More)
To optimize performance and power of a processor's cache, a multiple-divided module (MDM) cache architecture is proposed to save power at memory peripherals as well as the bit array. For a MxB-divided MDM cache, latency is equivalent to that of the smallest module and power consumption is only 1/MxB of the regular, non-divided cache. Based on the(More)
The Cell Broadband Enginee (Cell/B.E.) processor, developed jointly by Sony, Toshiba, and IBM primarily for next-generation gaming consoles, packs a level of floating-point, vector, and integer streaming performance in one chip that is an order of magnitude greater than that of traditional commodity microprocessors. Cell/B.E. blades are server and(More)