J. Gregory Steffan

Learn More
<italic>While architects understand how to build cost-effective parallel machines across a wide spectrum of machine sizes (ranging from within a single chip to large-scale servers), the real challenge is how to easily create</italic> parallel software <italic>to effectively exploit all of this raw performance potential. One promising technique for(More)
As we look to the future, and the prospect of a billion transistors on a chip, it seems inevitable that microprocessors will exploit having multiple parallel threads. To achieve the full potential of these “single-chip multiprocessors,” however, we must find a way to parallelize non-numeric applications. Unfortunately, compilers have had little success in(More)
With the advent of chip-multiprocessors, we are faced with the challenge of parallelizing performance-critical software. Transactional memory (TM) has emerged as a promising programming model allowing programmers to focus on parallelism rather than maintaining correctness and avoiding deadlock. Many implementations of hardware, software, and hybrid support(More)
Multithreaded processor architectures are becoming increasingly commonplace: many current and upcoming designs support chip multiprocessing, simultaneous multithreading, or both. While it is relatively straightforward to use these architectures to improve the throughput of a multithreaded or multiprogrammed workload, the real challenge is how to easily(More)
Thread-Level Speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this paper, we show that the key to good performance lies in the three different ways to communicate a value between speculative threads: speculation, synchronization, and(More)
While there have been many recent proposals for hardware that supports <i>Thread-Level Speculation</i> (TLS), there has been relatively little work on compiler optimizations to fully exploit this potential for parallelizing programs optimistically. In this paper, we focus on one important limitation of program performance under TLS, which is stalls due to(More)
While <i>soft processors</i> are increasingly common in FPGA-based embedded systems, it remains a challenge to scale their performance. We propose extending soft processor instruction sets to include support for vector processing. The resulting system of vectorized software and soft vector processor hardware is (i) <b>portable</b> to any FPGA architecture(More)
As more embedded systems are built using FPGA platforms, there is an increasing need to support processors in FPGAs. One option is the <i>soft processor</i>, a programmable instruction processor implemented in the reconfigurable logic of the FPGA. Commercial soft processors have been widely deployed, and hence we are motivated to understand their(More)
The increased demand for on-chip communication bandwidth as a result of the multi-core trend has made <i>networks on-chip</i> (NoCs) a compelling choice for the communication backbone in next-generation systems [3]. However, NoC designs have many power, area, and performance trade-offs in topology, buffer sizes, routing algorithms and flow control(More)