Fault Tolerance in Distributed Real Time Environment: A Survey
- Neelutpol Gogoi, Lakshmi Prasad Saikia
AbstructMany real-time systems have both performance requirements and reliability requirements. Performance is usually measured in terms of the value in completing tasks on time. Reliability is evaluated by hardware and software failure models. In many situations, there are trade-offs between task performance and task reliability. Thus, a mathematical assessment of performance-reliability trade-offs is necessary to evaluate the performance of real-time fault-tolerance systems. Assuming that the reliability of task execution is achieved through task replication, we present an approach that mathematically determines the replication factor for tasks. Our approach is novel in that it is a task schedule based analysis rather than a state based analysis as found in other models. Because we use a task schedule based analysis, we can provide a fast method to determine optimal redundancy levels, we are not limited to hardware reliability given by constant failure rate functions as in most other models, and we hypothesize that we can more naturally integrate with on-line real-time scheduling than when state based techniques are used. In this work, the goal is to maximize the total performance index, which is a performance-related reliability measurement. We present a technique based on a continuous task model and show how it very closely approximates discrete models and tasks with varying characteristics.