Semantic Scholar uses AI to extract papers important to this topic.
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow… Expand Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute… Expand Fundamentals.- Supervision and fault management of processes - tasks and terminology.- Reliability, Availability and… Expand To improve performance and reduce power, processor designers employ advances that shrink feature sizes, lower voltage levels… Expand This paper describes a general approach to constructing cooperative services that span multiple administrative domains. In such… Expand This paper speculates that technology trends pose new challenges for fault tolerance in microprocessors. Specifically, severely… Expand This paper provides a concepeual framework for expressing the attributes of what constitutes dependable and reliable computing… Expand 1 Introduction.- Fault Prevention and Fault Tolerance.- Anticipated and Unanticipated Faults.- Book Aim.- References.- 2 System… Expand An Information Dispersal Algorithm (IDA) is developed that breaks a file <italic>F</italic> of length <italic>L</italic… Expand The rapid progress in VLSI technology has reduced the cost of hardware, allowing multiple copies of low-cost processors to… Expand