Failure Assessment

  • Published 2006

Abstract

Three questions to which software developers want accurate, precise answers are "How can the software system fail?", " m a t bad things will happen ifthe software fails?t', and "How many failures will the software experience?". Numerous techniques have been devised to answer these questions; three of the best known are: 1) Software Fault Tree Analysis (SFTA) 2) Software Failure Modes, Effects, and Criticality Analysis (SFMECA 3) Software Fault/Failure Modeling. SFTA and SFMECA have been successfully used to analyze the flight software for a number of robotic planetary exploration missions, including Galileo, Cassini, and Deep Space 1. Given the increasing interest in reusing software components from mission to mission, one of us has developed techniques for reusing the corresponding portions of the SFTA and SFMECA, reducing the effort required to conduct these analyses. SFTA has also been shown to be effective in analyzing the security aspects of software systems; intrusion mechanisms and effects can easily be modeled using these techniques. The BiDirectional Safety Analysis (BDSA) method combines a forward search (similar to SFMECA) from potential failure modes to their eflects, with a backward search (similar to SFTA) from feasible hazards to the contributing causes of each hazard. BDSA offers an eficient way to identifi latent failures. Recent work has extended BDSA to productline applications such as flight-instrumentation displays and developed tool support for the reuse of the failure-analysis artifacts within a product line. BDSA has also been streamlined to support those projects having tight cost and/or schedule constraints for their failure analysis efforts. We discuss lessons learned from practice, describe available tools, and identi@ some future directions for the topic. A substantial amount of research has been devoted to estimating the number of failures that a software system will experience during test and operations, as well as the number of faults that have been inserted into that system during its development. One of us has found that the amount of structural change to a system during its development is strongly related to the number of faults inserted into it. Using techniques requiring no additional effort on the part of the development organization, the required measurements of structural evolution can be easily obtained from a development effort's configuration management system and readily transformed into an estimate of fault content. So far, structure-fault relationships have been identiJed for source code; current work seeks to examine artifacts available earlier in the lifecycle to determine i f similar relationships between structure and fault content can be found. In particular, relationships between requirements change requests and the number of faults inserted into the implemented system would provide a sign$cant improvement in our ability to control software quality during the early development phases.

Cite this paper

@inproceedings{2006FailureA, title={Failure Assessment}, author={}, year={2006} }