Learn More
Multi-core processors naturally exploit thread-level par-allelism (TLP). However, extracting instruction-level paral-lelism (ILP) from individual applications or threads is still a challenge as application mixes in this environment are nonuniform. Thus, multi-core processors should be flexible enough to provide high throughput for uniform parallel(More)
This paper proposes a new organization for clustered processors. Such processors have many advantages, including improved implementability and scalability, reduced power, and, potentially, faster clock speed. Difficulties lie in assigning instructions to clusters (steering) so as to minimize the effect of inter-cluster communication latency. The(More)
A register file is a critical component of a modernsuperscalar processor.It has a large number of entriesand read/write ports in order to enable high levels ofinstruction parallelism.As a result, the register file'sarea, access time, and energy consumption increasedramatically, significantly affecting the overallsuperscalar processor's performance and(More)
We examine the ability of CMPs, due to their lower onchip communication latencies, to exploit data parallelism at inner-loop granularities similar to that commonly targeted by vector machines. Parallelizing code in this manner leads to a high frequency of barriers, and we explore the impact of different barrier mechanisms upon the efficiency of this(More)
Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elusive goal. Traditional processors are composed of structures that do not scale to large instruction windows because of timing and power constraints. However, the behavior of(More)
With the aim of fuelling open-source, translational, early-stage drug discovery activities, the results of the recently completed antimycobacterial phenotypic screening campaign against Mycobacterium bovis BCG with hit confirmation in M. tuberculosis H37Rv were made publicly accessible. A set of 177 potent non-cytotoxic H37Rv hits was identified and will be(More)
This paper presents a novel mechanism for barrier synchronization on chip multi-processors (CMPs). By forcing the invalidation of selected I-cache lines, this mechanism starves threads and thus forces their execution to stop. Threads are let free when all have entered the barrier.We evaluated this mechanism using SMTSim and report much better (and most(More)
Multicore processors have emerged as a powerful platform on which to efficiently exploit thread-level parallelism (TLP). However, due to Amdahl’s Law, such designs will be increasingly limited by the remaining sequential components of applications. To overcome this limitation it is necessary to design processors with many lower–performance cores(More)
In recent years, processor manufacturers have converged on two types of register file architectures. Both IBM with its POWER series and Intel with its Pentium series are using a central storage for all in-flight values, which offers a high performance potential. AMD, on the other hand, uses an optimized implementation of the Future File for its line of(More)