Zeno Greenwood

Learn More
In order to adopt high performance clusters and grid computing for mission critical applications, fault tolerance is a necessity. Common fault tolerance techniques in distributed systems are normally achieved with checkpoint-recovery and job replication on alternative resources, in cases of a system outage. The first approach depends on the system's MTTR(More)
The DØ experiment at Fermilab's Tevatron will record several petabytes of data over the next five years in pursuing the goals of understanding nature and searching for the origin of mass. Computing resources required to analyze these data far exceed capabilities of any one institution. Moreover, the widely scattered geographical distribution of DØ(More)
Physicists today have employed grid technology to overcome various resource level hurdles. The collective resource utilization achieved through grid computing is critical to the overall computing capacity of the community and should be guaranteed. In an environment where job sites are cluster systems, a service node failure renders a whole system outage.(More)
  • 1