Ramadass Nagarajan

Learn More
This paper describes the <b>polymorphous</b> TRIPS architecture which can be configured for different granularities and types of parallelism. TRIPS contains mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in different modes for instruction, data, or thread-level parallelism. To adapt to small and(More)
Most recommender systems use Collaborative Filtering or Content-based methods to predict new items of interest for a user. While both methods have their own advantages, individually they fail to provide good recommendations in many situations. Incorporating components from both methods, a hybrid recommender system can overcome these shortcomings. In this(More)
In this paper, we survey the design space of a new class of architectures called Grid Processor Architectures (GPAs). These architectures are designed to scale with technology, allowing faster clock rates than conventional architectures while providing superior instruction-level parallelism on traditional workloads and high performance across a range of(More)
Growing on-chip wire delays will cause many future microarchitectures to be distributed, in which hardware resources within a single processor become nodes on one or more switched micronetworks. Since large processor cores will require multiple clock cycles to traverse, control must be distributed, not centralized. This paper describes the control protocols(More)
Technology trends present new challenges for processor architectures and their instruction schedulers. Growing transistor density will increase the number of execution units on a single chip, and decreasing wire transmission speeds will cause long and variable on-chip latencies. These trends will severely limit the two dominant conventional architectures:(More)
This paper describes the <i>polymorphous</i> TRIPS architecture that can be configured for different granularities and types of parallelism. The TRIPS architecture is the first in a class of post-RISC, dataflow-like instruction sets called explicit data-graph execution (EDGE). This EDGE ISA is coupled with hardware mechanisms that enable the processing(More)
While microprocessor pipeline depths have increased dramatically over the last decade, they are fast approaching their optimal depth. As shown in Figure 9.6.1, the number of logic levels in modern processors is nearing 10 fanoutof-4 (FO4) inverter delay. Substantial further reductions will be undesirable due to pipeline overheads and power consumption [1].(More)
Predication facilitates high-bandwidth fetch and large static scheduling regions, but has typically been too complex to implement comprehensively in out-of-ordermicroarchitectures. This paper describes dataflow predication, which provides per-instruction predication in a dataflow ISA, low predication computation overheads similar to VLIW ISAs, and low(More)
Fast, accurate, and effective performance analysis is essential for the design of modern processor architectures and improving application performance. Recent trends toward highly concurrent processors make this goal increasingly difficult. Conventional techniques, based on simulators and performance monitors, are ill-equipped to analyze how a plethora of(More)