Reasoning about the Reliability of Diverse Two-Channel Systems in Which One Channel Is "Possibly Perfect"

@article{Littlewood2012ReasoningAT,
  title={Reasoning about the Reliability of Diverse Two-Channel Systems in Which One Channel Is "Possibly Perfect"},
  author={Bev Littlewood and John M. Rushby},
  journal={IEEE Transactions on Software Engineering},
  year={2012},
  volume={38},
  pages={1178-1194}
}
This paper refines and extends an earlier one by the first author [1]. It considers the problem of reasoning about the reliability of fault-tolerant systems with two “channels” (i.e., components) of which one, A, because it is conventionally engineered and presumed to contain faults, supports only a claim of reliability, while the other, B, by virtue of extreme simplicity and extensive analysis, supports a plausible claim of “perfection.” We begin with the case where either channel can bring… 
Conservative Reasoning about the Probability of Failure on Demand of a 1-out-of-2 Software-Based System in Which One Channel Is "Possibly Perfect"
TLDR
The work reported here avoids the well-known difficulty that for two certainly-fallible channels, failures of the two will be dependent, i.e., the system pfd cannot be expressed simply as a product of the channel pfds.
Conservative reasoning about epistemic uncertainty for the probability of failure on demand of a 1-out-of-2 software-based system in which one channel is “ possibly perfect ”
In earlier work, (Littlewood and Rushby 2011) (henceforth LR), an analysis was presented of a 1-out-of-2 system in which one channel was “possibly perfect”. It was shown that, at the aleatory level,
On the probability of perfection of software-based systems
TLDR
This thesis provides 3 parallel sets of (quasi-)perfection models which could be used individually as a conservative end-to-end argument that reasoning from various types of evidence to the reliability of a software-based system.
Conservative claims about the probability of perfection of software-based systems
TLDR
This paper considers the difficult problem of expressing prior beliefs about the probability of failure on demand, and representing these mathematically, and assumes that, although he cannot provide a full probabilistic description of his uncertainty in a single distribution, the assessor can express some precise but partial belief about the unknowns.
"Validation of ultra-high dependability…" – 20 years on
TLDR
The 20th anniversary of the SCSC falls about 20 years later, so it seems a good time to revisit briefly the article and see where the debate about these issues now stands.
Conservative Confidence Bounds in Safety, from Generalised Claims of Improvement & Statistical Evidence
TLDR
This work proposes a formal probabilistic (Bayesian) organisation for “Proven-in-use”, “globally-at-least-equivalent” and “stress-tested” arguments, and demonstrates scenarios in which formalising such arguments substantially increases confidence in the target system.
Software Fault-Freeness and Reliability Predictions
TLDR
This work addresses how to combine evidence concerning probability of failure together with evidence pertaining to likelihood of fault-freeness, in a Bayesian framework, and guarantees reliability predictions that are conservative (err on the side of pessimism), despite the difficulty of stating prior probability distributions for reliability parameters.
...
...

References

SHOWING 1-10 OF 120 REFERENCES
The Use of Proof in Diversity Arguments
TLDR
It is shown that assessment of the reliability of the overall fault-tolerant system in this case may take advantage of claims for independence that are more plausible than those involved in design diversity.
An experimental evaluation of the assumption of independence in multiversion programming
TLDR
N-version programming has been proposed as a method of incorporating fault tolerance into software and it is revealed that the programs were individually extremely reliable but that the number of tests in which more than one program failed was substantially more than expected.
A Theoretical Basis for the Analysis of Multiversion Software Subject to Coincident Errors
TLDR
A condition under which a multiversion system is a better strategy than relying on a single version is given and some differences between the coincident errors model developed here and the model that assumes independent failures of component verions are studied.
Conceptual Modeling of Coincident Failures in Multiversion Software
TLDR
The authors formalize the notion of methodological diversity by considering the sequence of decision outcomes that constitute a methodology and show that diversity of decision implies likely diversity of behavior for the different versions developed under such forced diversity.
Modeling the Effects of Combining Diverse Software Fault Detection Techniques
TLDR
It is shown that many of these results for design diversity have counterparts in diverse fault detection in a single software version, and it is possible for effectiveness to be even greater than it would be under an assumption of statistical independence.
Byzantine Fault Tolerance, from Theory to Reality
TLDR
This paper revisits the Byzantine problem from a practitioner's perspective to provide the reader with a working appreciation of the Byzantine failure from a practical as well as a theoretical perspective.
Modelling the Effects of Combining Diverse Software Fault Detection Techniques
TLDR
This paper defines measures of fault finding effectiveness, and of diversity, and shows how these might be used to give guidance for the optimal application of different fault finding procedures to a particular program.
Validation of ultrahigh dependability for software-based systems
TLDR
It appears that engineering practice must take into account the fact that no solution exists, at present, for the validation of ultra-high dependability in systems relying on complex software.
Fail-stop processors: an approach to designing fault-tolerant computing systems
A methodology that facilitates the design of fault-tolerant computing systems is presented. It is based on the notion of a fail-stop processor. Such a processor automatically halts in response to any
Formal Verification for Fault-Tolerant Architectures: Prolegomena to the Design of PVS
TLDR
The verifications performed, the lessons learned, and some of the design decisions taken in PVS are described to better support these large, difficult, iterative, and collaborative verifications.
...
...