Learn More
Over the last twenty years, the open source community has provided more and more software on which the world's High Performance Computing (HPC) systems depend for performance and productivity. The community has invested millions of dollars and years of effort to build key components. But although the investments in these separate software elements have been(More)
Nondeterministic nature of parallel programs is the major difficulty in debugging. Order-replay, a technique to solve this problem, is widely used because of its small overhead. It has, however, several serious drawbacks: all processes of the parallel program have to participate in replay even when some of them are clearly not involved with the bug; and the(More)
In order to reduce the overhead of synchronizing operations of shared memory multiprocessors, this paper proposes a mechanism, named specMEM, to execute memory accesses following a synchronizing operation speculatively before the completion of the synchronization is conrmed. A unique feature of our mechanism is that the detection of speculation failure and(More)
In the FGCS project, we have developed a parallel inference machine, PIM/m, as one of the nal products of the project. PIM/m has up to 256 processor elements (PEs) connected by a 16 2 16 mesh network, while its predecessor, Multi-PSI/v2, has 64 PEs. A PE has three custom VLSI chips, one of which is a pipelined microprocessor having special mechanisms for(More)
In our research project " Mega-Scale Computing Based on Low-Power Technology and Workload Modeling " , we claim that a million-scale parallel system could be built with densely mounted low-power commodity processors. " MegaProto " is a proof-of-concept low-power and high-performance cluster build only with commodity components to implement this claim. A(More)