Henry Hoffmann

Learn More
The Raw microprocessor consumes 122 million transistors, executes 16 different load, store, integer or floating point instructions every cycle, controls 25 GB/s of I/O bandwidth, and has 2 MB of on-chip, distributed L1 SRAM memory, providing on-chip memory bandwidth of 43 GB/s. Is this the latest billion-dollar 3,000 man-year processor effort? In fact, Raw(More)
......As the number of processor cores integrated onto a single die increases, the design space for interconnecting these cores becomes more fertile. One manner of interconnecting the cores is simply to mimic multichip, multiprocessor computers of the past. Following past practice, simple busbased shared-memory multiprocessors can be integrated onto a(More)
With the increasing miniaturization of transistors, wire delays are becoming a dominant factor in microprocessor performance. To address this issue, a number of emerging architectures contain replicated processing units with software-exposed communication between one unit and another (e.g., Raw, SmartMemories, TRIPS). However, for their use to be(More)
Many modern computations (such as video and audio encoders, Monte Carlo simulations, and machine learning algorithms) are designed to trade off accuracy in return for increased performance. To date, such computations typically use ad-hoc, domain-specific techniques developed specifically for the computation at hand. Loop perforation provides a general(More)
This paper evaluates the Raw microprocessor. Raw addresses thechallenge of building a general-purpose architecture that performswell on a larger class of stream and embedded computing applicationsthan existing microprocessors, while still running existingILP-based sequential programs with reasonable performance in theface of increasing wire delays. Raw(More)
We present PowerDial, a system for dynamically adapting application behavior to execute successfully in the face of load and power fluctuations. PowerDial transforms static configuration parameters into dynamic knobs that the PowerDial control system can manipulate to dynamically trade off the accuracy of the computation in return for reductions in the(More)
Many modern computations (such as video and audio encoders, Monte Carlo simulations, and machine learning algorithms) are designed to trade off accuracy in return for increased performance. To date, such computations typically use ad-hoc, domain-specific techniques developed specifically for the computation at hand. We present a new general technique, code(More)
Image and video codecs are prevalent in multimedia devices, ranging from embedded systems, to desktop computers, to high-end servers such as HDTV editing consoles. It is not uncommon however that developers create and customize separate coder and decoder implementations for each of the architectures they target. This practice is time consuming and error(More)
Streaming programs represent an in reasingly important and widespread lass of appli ations that holds unpre edented opportunities for high-impa t ompiler te hnology. Unlike sequential programs with obs ured dependen e information and omplex ommuni ation patterns, a stream program is naturally written as a set of on urrent lters with regular steady-state(More)
As the complexity of computing systems increases, application programmers must be experts in their application domain and have the systems knowledge required to address the problems that arise from parallelism, power, energy, and reliability concerns. One approach to relieving this burden is to make use of self-aware computing systems, which automatically(More)