Pradeep Ramachandran

Learn More
Future microprocessors need low-cost solutions for reliable operation in the presence of failure-prone devices. A promising approach is to detect hardware faults by deploying low-cost monitors of software-level symptoms of such faults. Recently, researchers have shown these mechanisms work well, but there remains a non-negligible risk that several faults(More)
— The huge investment in the design and production of multicore processors may be put at risk because the emerging highly miniaturized but unreliable fabrication technologies will impose significant barriers to the lifelong reliable operation of future chips. Extremely complex, massively parallel, multi-core processor chips fabricated in these technologies(More)
With continued CMOS scaling, future shipped hardware will be increasingly vulnerable to in-the-field faults. To be broadly deployable, the hardware reliability solution must incur low overheads, precluding use of expensive redundancy. We explore a cooperative hardware-software solution that watches for anomalous software behavior to indicate the presence of(More)
Decreasing hardware reliability is expected to impede the exploitation of increasing integration projected by Moore's Law. There is much ongoing research on efficient fault tolerance mechanisms across all levels of the system stack, from the device level to the system level. High-level fault tolerance solutions, such as at the microarchitecture and system(More)
As devices continue to scale, future shipped hardware will likely fail due to in-the-field hardware faults. As traditional redundancy-based hardware reliability solutions that tackle these faults will be too expensive to be broadly deployable, recent research has focused on low-overhead reliability solutions. One approach is to employ low-overhead ("(More)
In the near future, hardware is expected to become increasingly vulnerable to faults due to continuously decreasing feature size. Software-level symptoms have previously been used to detect permanent hardware faults. However, they can not detect a small fraction of faults, which may lead to Silent Data Corruptions(SDCs). In this paper, we present a system(More)
Continued technology scaling is resulting in systems with billions of devices. Unfortunately, these devices are prone to failures from various sources, resulting in even commodity systems being affected by the growing reliability threat. Thus, traditional solutions involving high redundancy or piecemeal solutions targeting specific failure modes will no(More)
A method for Statistical Fault Injection (SFI) into arbitrary latches within a full system hardware-emulated model is validated against particle-beam-accelerated SER testing for a modern microprocessor. As performed on the IBM POWER6 microprocessor, SFI is capable of distinguishing between error handling states associated with the injected bit flip.(More)
—Future microprocessors need low-cost reliability solutions to enable reliable operations in the presence of failure-prone devices. The state-of-the-art reliability solutions detect the presence of hardware faults by deploying low-cost software-level symptom monitors. Recently researchers have shown that these detection mechanisms provide high fault(More)
— This work concerns metrics for evaluating microar-chitectural enhancements to improve processor lifetime reliability. A commonly reported reliability metric is mean time to failure (MTTF). Although the MTTF metric is simpler to evaluate, it does not provide information on the reliability characteristics during the relatively short operational life of(More)