Rubén González

Learn More
Multi-core processors naturally exploit thread-level par-allelism (TLP). However, extracting instruction-level paral-lelism (ILP) from individual applications or threads is still a challenge as application mixes in this environment are nonuniform. Thus, multi-core processors should be flexible enough to provide high throughput for uniform parallel(More)
This paper proposes a new organization for clustered processors. Such processors have many advantages, including improved implementability and scalability, reduced power, and, potentially, faster clock speed. Difficulties lie in assigning instructions to clusters (steering) so as to minimize the effect of inter-cluster communication latency. The(More)
A register file is a critical component of a modernsuperscalar processor.It has a large number of entriesand read/write ports in order to enable high levels ofinstruction parallelism.As a result, the register file'sarea, access time, and energy consumption increasedramatically, significantly affecting the overallsuperscalar processor's performance and(More)
We examine the ability of CMPs, due to their lower onchip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vector machines. Parallelizing code in this manner leads to a high frequency of barriers, and we explore the impact of different barrier mechanisms upon the efficiency of this(More)
Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elusive goal. Traditional processors are composed of structures that do not scale to large instruction windows because of timing and power constraints. However, the behavior of(More)
This paper presents a novel mechanism for barrier synchronization on chip multi-processors (CMPs). By forcing the invalidation of selected I-cache lines, this mechanism starves threads and thus forces their execution to stop. Threads are let free when all have entered the barrier.We evaluated this mechanism using SMTSim and report much better (and most(More)
Multicore processors have emerged as a powerful platform on which to efficiently exploit thread-level parallelism (TLP). However, due to Amdahl’s Law, such designs will be increasingly limited by the remaining sequential components of applications. To overcome this limitation it is necessary to design processors with many lower–performance cores(More)
In recent years, processor manufacturers have converged on two types of register file architectures. Both IBM with its POWER series and Intel with its Pentium series are using a central storage for all in-flight values, which offers a high performance potential. AMD, on the other hand, uses an optimized implementation of the Future File for its line of(More)
The majority of register file designs follow one of two well– known approaches. Many modern high-performance processors (POWER4 [1], Pentium4 [2]) use a merged register file that holds both architectural and rename registers. Other processors use a Future File (eg, Opteron [3]) with rename registers kept separately in reservation stations. Both approaches(More)
The mode in which sexual organisms choose mates is a key evolutionary process, as it can have a profound impact on fitness and speciation. One way to study mate choice in the wild is by measuring trait correlation between mates. Positive assortative mating is inferred when individuals of a mating pair display traits that are more similar than those expected(More)