Rajagopalan Desikan

Learn More
Abstract: We measure the experimental error that arises from the use of non-validated simulators in computer architecture research, with the goal of increasing the rigor of simulation- based studies. We describe the methodology that we used to validate a microprocessor simulator against a Compaq DS-10L workstation, which contains an Alpha 21264 processor.(More)
This paper describes several methods for improving thescalability of memory disambiguation hardware for futurehigh ILP processors. As the number of in-flight instructionsgrows with issue width and pipeline depth, the load/storequeues (LSQ) threaten to become a bottleneck in both powerand latency. By employing lightweight approximate hashingin hardware with(More)
Growing on-chip wire delays will cause many future microarchitectures to be distributed, in which hardware resources within a single processor become nodes on one or more switched micronetworks. Since large processor cores will require multiple clock cycles to traverse, control must be distributed, not centralized. This paper describes the control protocols(More)
Impediments to main memory performance have traditionally been due to the divergence in processor versus memory speed and the pin bandwidth limitations of modern packaging technologies. In this paper we evaluate a magneto-resistive memory (MRAM)-based hierarchy to address these future constraints. MRAM devices are nonvolatile, and have the potential to be(More)
Pipeline flushes are becoming increasingly expensive in modern microprocessors with large instruction windows and deep pipelines. Selective re-execution is a technique that can reduce the penalty of mis-speculations by re-executing only instructions affected by the mis-speculation, instead of all instructions. In this paper we introduce a new selective(More)
This short paper serves to correct the errors contained in the paper entitled "Measuring Experimental Error in Microprocessor Simulation," presented at the 2001 International Symposium on Computer Architecture (ISCA-28) [2]. That paper contained a study of a validated microarchitectural simulator called sim-alpha, and included a case study that compared(More)
Transmission of cache lines in cache-coherent shared memory machines is necessary for communication but can cause significant latencies across the system. The ongoing growth in cache capacities shifts the distribution of cache misses from capacity and conflict misses to coherence misses, which consist of misses caused by both true and false sharing. In this(More)
False sharing of data is an important phenomenon affecting performance in shared memory multiprocessors. False sharing results in unnecessary coherency overhead by causing invalidation of the shared cache line, and increasing the latency of the load accessing the cache line. As microprocessors incorporate increasingly large caches with large cache lines,(More)
This te hni al report des ribes installation, use, and design of sim-alpha, an exe ution driven Alpha 21264 simulator. To in rease simulator a ura y, we have in orporated many of the low level features found in the Alpha 21264. When ompared to a hardware 21264 implementation, sim-alpha a hieves 2% error a ross a suite of mi roben hmarks designed to stress(More)
In this paper, we describe the design and implementation of the primary memory system of the TRIPS processor. To match the aggressive execution bandwidth and support high levels of memory parallelism, the primary memory system is completely partitioned into four banks, can support up to 256 in-flight memory instructions, aggressive reordering of in-flight(More)