Tony Nowatzki

Learn More
Accelerators and specialization in various forms are emerging as a way to increase processor performance. Examples include Navigo, Conservation-Cores, BERET, and DySER. While each of these employ different primitives and principles to achieve specialization, they share some common concerns with regards to implementation. Two of these concerns are: how to(More)
General purpose processors (GPPs), from small inorder designs to many-issue out-of-order, incur large power overheads which must be addressed for future technology generations. Major sources of overhead include structures which dynamically extract the data-dependence graph or maintain precise state. Considering irregular workloads, current specialization(More)
ion errors than an unknown black box. Reviewers and the community needs to change its mindset as well – having blind faith in “standard tools,” while completely discounting other tools is not appropriate. We revisit the issue of open versus in-house tools in Section 4. 3.2. Pitfall 2: False confidence from validation overgeneralization in simulator papers,(More)
The end of Dennard Scaling has forced architects to focus on designing for execution efficiency. Course-grained reconfigurable architectures (CGRAs) are a class of architectures that provide a configurable grouping of functional units that aim to bridge the gap between the power and performance of custom hardware and the flexibility of software. Despite(More)
Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which suffers from poor compiler/architect productivity, lack of insight on optimality, and inhibits migration(More)
Modern microprocessors exploit data level parallelism through in-core data-parallel accelerators in the form of short vector ISA extensions such as SSE/AVX and NEON. Although these ISA extensions have existed for decades, compilers do not generate good quality, high-performance vectorized code without significant programmer intervention and manual(More)
Hardware specialization has emerged as a promising paradigm for future microprocessors. Unfortunately, it is natural to develop and evaluate such architectures within end-to-end vertical silos spanning application, language/ compiler, hardware design and evaluation tools, leaving little opportunity for cross-architecture analysis and innovation. This paper(More)
Specialization and accelerators are being proposed as an effective way to address the slowdown of Dennard scaling. DySER is one such accelerator, which dynamically synthesizes large compound functional units to match program regions, using a co-designed compiler and microarchitecture. We have completed a full prototype implementation of DySER integrated(More)