Learn More
The Raw microprocessor consumes 122 million transistors, executes 16 different load, store, integer or floating point instructions every cycle, controls 25 GB/s of I/O bandwidth, and has 2 MB of on-chip, distributed L1 SRAM memory, providing on-chip memory bandwidth of 43 GB/s. Is this the latest billion-dollar 3,000 man-year processor effort? In fact, Raw(More)
—In the era of multi-core, computer vision has emerged as an exciting application area which promises to continue to drive the demand for both more powerful and more energy efficient processors. Although there is still a long way to go, vision has matured significantly over the last few decades, and the list of applications that are useful to end users(More)
Due to the breakdown of Dennardian scaling, the percentage of a silicon chip that can switch at full frequency is dropping exponentially with each process generation. This utilization wall forces designers to ensure that, at any point in time, large fractions of their chips are effectively dark or dim silicon, i.e., either idle or significantly(More)
This paper evaluates the Raw microprocessor. Raw addresses thechallenge of building a general-purpose architecture that performswell on a larger class of stream and embedded computing applicationsthan existing microprocessors, while still running existingILP-based sequential programs with reasonable performance in theface of increasing wire delays. Raw(More)
Tiled architectures provide a paradigm for designers to turn silicon resources into processors with burgeoning quantities of programmable functional units and memories. The architecture has a dual responsibility: first, it must expose these resources in a way that is programmable. Second, it needs to manage the power associated with such resources.We(More)
Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a <i>utilization wall</i> that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by reducing the per-computation power requirements and(More)
The bypass paths and multiported register files in microprocessors serve as an implicit interconnect to communicate operand values between pipeline stages and multiple ALUs. Previous superscalar designs used centralized structures for this interconnect and do not scale with increasing ILP demands. In search of scalability, recent microprocessor designs in(More)
Rapid advances in technology force a quest for computer architectures that exploit new opportunities and shed existing mechanisms that do not scale. Current architec-tures, such as hardware scheduled superscalars, are already hitting performance and complexity limits and cannot be scaled indefinitely. The Reconfigurable Architecture Workstation (Raw) is a(More)
Complex &#x201C;fat operators&#x201D; are important contributors to the efficiency of specialized hardware. This paper introduces two new techniques for constructing efficient fat operators featuring up to dozens of operations with arbitrary and irregular data and memory dependencies. These techniques focus on minimizing critical path length and load-use(More)