Nicholas P. Carter

Learn More
For parallelism to become tractable for mass programmers, shared-memory languages and environments must evolve to enforce disciplined practices that ban "wild shared-memory behaviors;'' e.g., unstructured parallelism, arbitrary data races, and ubiquitous non-determinism. This software evolution is a rare opportunity for hardware designers to rethink(More)
The M-Machine is an experimental multicomputer being developed to test architectural concepts motivated by the constraints of modern semiconductor technology and the demands of programming systems. The M-Machine computing nodes are connected with a 3-D mesh network; each node is a multithreaded processor incorporating 9 function units, on-chip cache, and(More)
Traditional methods of providing protection in memory systems do so at the cost of increased context switch time and/or increased storage to record access permissions for processes. With the advent of computers that supported cycle-by-cycle multithreading, protection schemes that increase the time to perform a context switch are unacceptable, but protecting(More)
Much of the improvement in computer performance over the last twenty years has come from faster transistors and architectural advances that increase parallelism. Historically, parallelism has been exploited either at the instruction level with a grain-size of a single instruction or by partitioning applications into coarse threads with grain-sizes of(More)
DARPA's Ubiquitous High-Performance Computing (UHPC) program asked researchers to develop computing systems capable of achieving energy efficiencies of 50 GOPS/Watt, assuming 2018-era fabrication technologies. This paper describes Runnemede, the research architecture developed by the Intel-led UHPC team. Runnemede is being developed through a co-design(More)
We compare techniques that dynamically scale the voltage of individual network links to reduce power consumption with an approach in which all links in the network are set to the same voltage and adaptive routing is used to distribute load across the network. Our results show that adaptive routing with static network link voltages outperforms(More)
Current electronic systems implement reliability using only a few layers of the system stack, which simplifies the design of other layers but is becoming increasingly expensive over time. In contrast, cross-layer resilient systems, which distribute the responsibility for tolerating errors, device variation, and aging across the system stack, have the(More)
An automatic music score-reading system will facilitate applications including computer-based editing of new editions, production of databases for musicological research, and creation of braille or large-format scores for the blind or partially-sighted. The work described here deals specifically with initial processing of images containing early seventeenth(More)
We are rapidly approaching an inflection point where the conventional target of producing perfect, identical transistors that operate without upset can no longer be maintained while continuing to reduce the energy per operation. With power requirements already limiting chip performance, continuing to demand perfect, upset-free transistors would mean the end(More)
In order to pose a successful challenge to conventional processor architectures, reconfigurable computing systems must achieve significantly better performance than conventional programmable processors by both greatly reducing the number of clock cycles required to execute a wide range of applications and achieving high clock rates when implemented in(More)