A census of Tandem system availability between 1985 and 1990

  title={A census of Tandem system availability between 1985 and 1990},
  author={J. Gray},
  journal={IEEE Transactions on Reliability},
  • J. Gray
  • Published 1990
  • Engineering
  • IEEE Transactions on Reliability
A census of customer outages reported to Tandem showing a clear improvement in the reliability of hardware and maintenance has been taken. It indicates that software is now the major source of reported outages (62%), followed by system operations (15%). This is a dramatic shift from the statistics for 1985. Even after discounting systematic underreporting of operations and environmental outages, the conclusion is clear: hardware faults and hardware maintenance are no longer a major source of… Expand

Figures and Tables from this paper

Choosing from redundant designs of power systems using system outage rate and cost
Abstract The push for high availability is on. As customers keep telling computer manufacturers, any outage to their operations is unacceptable. For those involved at the design stage, this usuallyExpand
Analysis of software halts in the tandem GUARDIAN operating system
  • I. Lee, R. Iyer
  • Engineering, Computer Science
  • [1992] Proceedings Third International Symposium on Software Reliability Engineering
  • 1992
The results show that the occurrences of software halts are not correlated with each other in time and fault tolerance in the measured system was shown to reduce the service loss by nearly 90%. Expand
An analysis of client/server outage data
  • A. Wood
  • Computer Science
  • Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium
  • 1995
This paper examines client/server outage data and presents a list of outage causes extracted from the data, which include hardware, software, operations, and environmental failures, as well as outages due to planned reconfigurations, to predict availability in a typical client/ server environment and to evaluate various fault-tolerant architectures. Expand
High-availability computer systems
The techniques used to build highly available computer systems are sketched, and the use of pairs of computer systems at separate locations to guard against unscheduled outages due to outside sources (communication or power failures, earthquakes, etc.) is addressed. Expand
Analysis of Preventive Maintenance in Transactions Based Software Systems
An analytical model of a software system which serves transactions is presented and expressions for resulting steady state availability, probability that an arriving transaction is lost and an upper bound on the expected response time of a transition are derived. Expand
2.4.4 Prediction of Information System Availability in Mission Critical and Business Critical Applications
One of the most important attributes of on-line computer systems that are performing mission or business critical applications is availability. System Engineers are often called upon to predict theExpand
Dependability and Performance Measures for the Database Practitioner
We estimate the availability, reliability, and mean transaction time (response time) for repairable database configurations, centralized or distributed, in which each service component isExpand
A study of the reliability of Internet sites
By applying an appropriate test statistic, some samples were found to have a realistic change of being drawn from an exponential distribution, while others can be confidently classed as nonexponential. Expand
Measurement and Analysis of Failures in Computer Systems
A study of software failures spanning several different releases of Tandem's NonStop-UX operating system running on Tandem Integrity S2(TMR) systems, focusing primarily on those TPRs that report a UNIX panic that subsequently crashes the system. Expand
System Support for Software Fault Tolerance in Highly Available Database Management Systems
The dissertation describes modifications to the storage system that improve its performance in environments with high update rates and adds to the fast recovery capabilities of POSTGRES with two techniques for maintaining B-tree index consistency without log processing. Expand


Tandem's remote data facility
  • J. Lyon
  • Computer Science
  • Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage
  • 1990
RDF allows an organization to maintain a geographically remote backup system with an up-to-date copy of the database that should the first system fail, this second system can rapidly take over the workload, minimizing downtime. Expand
Why Do Computers Stop and What Can Be Done About It?
  • J. Gray
  • Computer Science
  • Symposium on Reliability in Distributed Software and Database Systems
  • 1986
It is pointed out that faults in production software are often soft (transient) and that a ransaction mechanism combined with persistent processpairs provides fault-tolerant execution -- the key to software fault -tolerance. Expand
Software Fault Tolerance
The principal models, specification, building, evaluation, and system integration of fault-tolerant software are discussed, and goals for future work are discussed. Expand
The N-Version Approach to Fault-Tolerant Software
  • A. Avizienis
  • Computer Science
  • IEEE Transactions on Software Engineering
  • 1985
Principal requirements for the implementation of N-version software are summarized and the DEDIX distributed supervisor and testbed for the execution of N -version software is described. Expand
Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components
The paper that follows is based on notes taken by Dr. R. S. Pierce on five lectures given by the author at the California Institute of Technology in January 1952, and it is the author's conviction that error should be treated by thermodynamic methods, and be the subject of a thermodynamical theory. Expand
Dissecting software failures
  • Hewlett-PackardJournul
  • 1989
Learning from field experience with fault tolerant systems
  • Proc. Int 'I Workrhop Hardware Fault Tolerance in Multiprocessors (at University of Illinois
  • 1989
Powering computer-controlled systems: AC or DC?
  • Telesis
  • 1984
Probabilistic Logics and the Synthesis of Reliable Organisms From Unreliable Components", Automata Studies
  • 1956
If the UPS is present but fails, then the UPS failure is the fatal fault