Oliverio J. Santana

Learn More
The continuously increasing gap between processor and memory speeds is a serious limitation to the performance achievable by future microprocessors. Currently, processors tolerate long-latency memory operations largely by maintaining a high number of in-flight instructions. In the future, this may require supporting many hundreds, or even thousands, of(More)
Nowadays, multithreaded architectures are becoming more and more popular. In order to evaluate their behavior , several methodologies and metrics have been proposed. A methodology defines when the measurements of a given workload execution are taken. A metric combines those measurements to obtain a final evaluation result. However, since current evaluation(More)
Fetch performance is a very important factor because it effectively limits the overall processor performance. However, there is little performance advantage in increasing front-end performance beyond what the back-end can consume. For each processor design, the target is to build the best possible fetch engine for the required performance level A fetch(More)
In this paper, we propose Runahead Threads (RaT) as a valuable solution for both reducing resource contention and exploiting memory-level parallelism in Simultaneous Multithreaded (SMT) processors. Our technique converts a resource intensive memory-bound thread to a speculative light thread under long-latency blocking memory operations. These speculative(More)
Fetch engine performance is a key topic in superscalar processors, since it limits the instruction-level parallelism that can be exploited by the execution core. In the search of high performance, the fetch engine has evolved toward more efficient designs, but its complexity has also increased.In this paper, we present the stream fetch engine, a novel(More)
Fast instruction decoding is a challenge for the design of CISC microprocessors. A well-known solution to overcome this problem is using a trace cache. It stores and fetches already decoded instructions, avoiding the need for decoding them again. However, implementing a trace cache involves an important increase in the fetch architecture complexity.In this(More)