Michael Thomas

Learn More
Data distribution, storage and access are essential to CPU-intensive and data-intensive high performance Grid computing. A newly emerged file system, Hadoop distributed file system (HDFS), is deployed and tested within the Open Science Grid (OSG) middleware stack. Efforts have been taken to integrate HDFS with other Grid tools to build a complete service(More)
The use of meta-schedulers for resource management in large-scale distributed systems often leads to a hierarchy of schedulers. In this paper, we discuss why existing meta-scheduling hierarchies are sometimes not sufficient for Grid systems due to their inability to re-organise jobs already scheduled locally. Such a job re-organisation is required to adapt(More)
Clarens is a Grid-enabled web service infrastructure implemented to augment the current batch-oriented Grid services computing model in the Compound Muon Solenoid (CMS) experiment of the LHC. Clarens servers leverage the Apache web server to provided a scalable framework for clients to communicate with services using the SOAP and XML-RPC protocols. This(More)
High energy physics (HEP) and other scientific communities have adopted service oriented architectures (SOA) as part of a larger grid computing effort. This effort involves the integration of many legacy applications and programming libraries into a SOA framework. The grid analysis environment (GAE) (Lingen et al., 2004) is such a service oriented(More)
Large scientific collaborations are moving towards service oriented architectures for implementation and deployment of globally distributed systems. Clarens is a high performance, easy to deploy Web service framework that supports the construction of such globally distributed systems. This paper discusses some of the core functionality of Clarens that the(More)
Lambda Station is an ongoing project of Fermi National Accelerator Laboratory and the California Institute of Technology. The goal of this project is to design, develop and deploy network services for path selection, admission control and flow based forwarding of traffic among data- intensive Grid applications such as are used in High Energy Physics and(More)
We present a data transfer system for the grid environment built on top of the open source FDT tool (Fast Data Transfer) developed by Caltech in collaboration with the National University of Science and Technology (Pakistan). The enhancement layer above FDT consists of a client program fdtcp (FDT copy) and a fdtd service (FDT daemon). This pair of(More)
The Grid Analysis Environment (GAE), which is a continuation of the CAIGEE project [5], is an effort to develop, integrate and deploy a system for distributed analysis. The current focus within the GAE is on the CMS experiment [1] however the GAE design abstracts from any specific scientific experiment and focuses on scientific analysis in general. The GAE(More)
The concept of coupling geographically distributed resources for solving large scale problems is becoming increasingly popular forming what is popularly called grid computing. Management of resources in the Grid environment becomes complex as the resources are geographically distributed , heterogeneous in nature and owned by different individuals and(More)
In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the jobs. This kind of scheduling, in which there is no consideration of network characteristics, can lead to performance(More)