Learn More
S oft errors, also called single-event upsets (SEUs), are radiation-induced transient errors caused by neutrons from cosmic rays and alpha particles from packaging material. Traditionally, soft errors were regarded as a major concern only for space applications. Yet, for designs manufactured at advanced technology nodes—such as 90 nm, 65 nm, and(More)
CASP, Concurrent Autonomous chip self-test using <b>S</b>tored test <b>P</b>atterns, is a special kind of self-test where a system tests itself concurrently during normal operation without any downtime visible to the <b>end-user.</b> CASP consists of two ideas: 1. Storage of very thorough test patterns in non-volatile memory; and, 2. Architectural and(More)
There is a growing concern about the increasing vulnerability of future computing systems to errors in the underlying hardware. Traditional redundancy techniques are expensive for designing energy-efficient systems that are resilient to high error rates. We present Error Resilient System Architecture (ERSA), a low-cost robust system architecture for(More)
Three-dimensional die stacking integration provides the ability to stack multiple layers of processed silicon with a large number of vertical interconnects. Through Silicon Vias (TSVs) provide a promising area- and power-efficient way to support communication between different stack layers. Unfortunately, low TSV yield significantly impacts design of(More)
We present here a report produced by a workshop on 'Addressing failures in exascale computing' held in Park City, Utah, 4–11 August 2012. The charter of this workshop was to establish a common taxonomy about resilience across all the levels in a computing system, discuss existing knowledge on resilience across the various hardware and software layers of an(More)
Disk-oriented approaches to online storage are becoming increasingly problematic: they do not scale gracefully to meet the needs of large-scale Web applications, and improvements in disk capacity have far outstripped improvements in access latency and bandwidth. This paper argues for a new approach to datacenter storage called RAMCloud, where information is(More)