Learn More
<italic>While architects understand how to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how to easily create</italic> parallel software <italic>to effectively exploit all of this raw performance potential. One promising technique for(More)
Multithreaded processor architectures are becoming increasingly commonplace: many current and upcoming designs support chip multiprocessing, simultaneous multithreading, or both. While it is relatively straightforward to use these architectures to improve the throughput of a multithreaded or multiprogrammed workload, the real challenge is how to easily(More)
As we look to the future, and the prospect of a billion transistors on a chip, it seems inevitable that microprocessors will exploit having multiple parallel threads. To achieve the full potential of these " single-chip multi-processors, " however, we must find a way to parallelize non-numeric applications. Unfortunately, compilers have had little success(More)
While <i>soft processors</i> are increasingly common in FPGA-based embedded systems, it remains a challenge to scale their performance. We propose extending soft processor instruction sets to include support for vector processing. The resulting system of vectorized software and soft vector processor hardware is (i) <b>portable</b> to any FPGA architecture(More)
While there have been many recent proposals for hardware that supports <i>Thread-Level Speculation</i> (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to(More)
With the advent of chip-multiprocessors, we are faced with the challenge of parallelizing performance-critical software. Transactional memory (TM) has emerged as a promising programming model allowing programmers to focus on parallelism rather than maintaining correctness and avoiding deadlock. Many implementations of hardware , software, and hybrid support(More)
The increased demand for on-chip communication bandwidth as a result of the multi-core trend has made <i>networks on-chip</i> (NoCs) a compelling choice for the communication backbone in next-generation systems [3]. However, NoC designs have many power, area, and performance trade-offs in topology, buffer sizes, routing algorithms and flow control(More)
A key advantage of <i>soft</i> processors (processors built on an FPGA programmable fabric) over hard processors is that they can be customized to suit an application program's specific software. This notion has been exploited in the past principally through the use of application-specific instructions. While commercial soft processors are now widely(More)
Thread-level speculation (TLS) has proven to be a promising method of extracting parallelism from both integer and scientific workloads, targeting speculative threads that range in size from hundreds to several thousand dynamic instructions and have minimal dependences between them. Recent work has shown that TLS can offer compelling performance(More)