Joao Carreira

Learn More
This paper presents Xception, a software fault injection and monitoring environment. Xception uses the advanced debugging and performance monitoring features existing in most of the modern processors to inject more realistic faults by software, and to monitor the activation of the faults and their impact on the target system behaviour in detail. Faults are(More)
An important step in the development of dependable systems is the validation of their fault tolerance properties. Fault injection has been widely used for this purpose, however with the rapid increase in processor complexity, traditional techniques are also increasingly more difficult to apply. This paper presents a new software implemented fault injection(More)
In the research reported in this paper, transient faults were injected in the nodes and in the communication subsystem (by using software fault injection) of a commercial parallel machine running several real applications. The results showed that a significant percentage of faults caused the system to produce wrong results while the application seemed to(More)
This paper addresses the problem of injection of faults in the communication system of disjoint memory parallel computers and presents fault injection results showing that 5% to 30% of the faults injected in the communication subsystem of a commercial parallel computer caused undetected errors that lead the application to generate erroneous results. All(More)
In this paper we present a library for the PARIX 1 Operating System, called ParLin, that provides support for Linda-like programming in transputer arrays. The primitives offered by this library are quite efficient, due to some design decisions, explained in the paper, that lead to a very simple internal structure, with a centralized Tuple Space. We claim(More)
In this paper, we introduce the design of a parallel library for MPI based on the Linda programming paradigm, called Eilean. It provides a scalable distribution of the Tuple Space through an hierarchical (or cluster) partitioning scheme, and tuple type specific access/distribution policies. Portability of the library is achieved using the message passing(More)
The development of efficient and portable parallel programming systems can be a complex and troublesome task. Although there are several portable environments that are meant to be used as a support layer for higher level programming systems, they all provide different features and different levels of func-tionality to the system programmer. In this paper we(More)
This paper addresses the evaluation of the dependability properties of distributed memory parallel systems through fault injection. The most popular parallel computers are based on the distributed memory architecture where loosely coupled processors communicate by message-passing. Fault tolerance is an issue which increasingly concerns manufacturers and end(More)