Learn More
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For long running applications using a large number of processors, it is essential that fault tolerance be used to prevent a total loss of all finished computations after a failure.(More)
Commercial graphics processing units (GPUs) prove their attractive, inexpensive in high performance scientific applications. However, a recent research through Folding@home demonstrates that two-thirds of tested GPUs on Folding@home exhibit a detectable, pattern-sensitive rate of memory soft errors for GPGPU. Fault tolerance has been viewed as critical to(More)
The ability of robots to quickly and accurately localize their neighbors is extremely important in swarm robotics. Prior approaches generally rely either on global information provided by GPS, beacons, and landmarks, or complex local information provided by vision systems. In this paper we provide a new technique, based on trilateration. This system is(More)
Scalable circuits of organic logic and memory are realized using all-additive printing processes. A 3-bit organic complementary decoder is fabricated and used to read and write non-volatile, rewritable ferroelectric memory. The decoder-memory array is patterned by inkjet and gravure printing on flexible plastics. Simulation models for the organic(More)
A novel approach is presented that bridges the gap between anomaly and misuse detection for identifying cyber attacks. The approach consists of an ensemble of classifiers that, together, produce a more informative output regarding the class of attack than any of the classifiers alone. Each classifier classifies based on a limited subset of possible features(More)
Check pointing is the most popular fault tolerance method used in high-performance computing (HPC) systems. However, increasing failure rates requires more frequent checkpoints, thus makes check pointing more expensive. We present a checkpoint-free fault tolerance technique. It takes advantage of both data dependencies and communication-induced redundancies(More)
—Teams of autonomous cooperating vehicles are well-suited for meeting the challenges associated with mobile marine sensor networks. Swarms built using a physicomimetics approach exhibit predictable behavior – an important benefit for extended duration deployments of autonomous ocean platforms. By using a decentralized control framework, we minimize energy(More)
Soft errors are one-time events that corrupt the state of a computing system but not its overall functionality. Soft errors normally do not interrupt the execution of the affected program, but the affected computation results can not be trusted any more. A well known technique to correct soft errors in matrix-matrix multiplication is algorithm-based fault(More)
Keywords: Algorithm-based fault tolerance Matrix multiplication Fault tolerant linear algebra On-line algorithm based fault tolerance a b s t r a c t Soft errors are one-time events that corrupt the state of a computing system but not its overall func-tionality. Soft errors normally do not interrupt the execution of the affected program, but the affected(More)
In today's high performance computing, many MPI programs (e.g., ScaLAPACK applications, High Performance Linpack Benchmark HPL, and many PDE solvers based on domain decomposition methods) organize their computational processes as multidimensional process grids. Communications are often necessary in each dimension. Multidimensional broadcast, where a(More)