Predicting Sporadic Grid Data Transfers


The increasingly common practice of (1) replicating datasets and (2) using resources as distributed data stores in Grid environments has lead to the problem of determining which replica can be accessed most efficiently. Due to diverse performance characteristics and load variations of several components in the end-to-end path linking these various locations, selecting a replica location from among many requires accurate prediction information of end-to-end data transfer times between the sources and sinks. In this paper, we present a prediction system that is based on combining end-to-end application throughput observations and network load variations, drawing from their merits of capturing whole system performance and variations in load patterns respectively. We develop a set of regression models to derive predictions that characterize the effect of network load variations on file transfer times. We apply these techniques to the GridFTP data movement tool, part of the Globus ToolkitTM, and observe performance gains of up to 10% in prediction accuracy when compared to approaches based on past system behavior in isolation.

DOI: 10.1109/HPDC.2002.1029918

Extracted Key Phrases

11 Figures and Tables


Citations per Year

77 Citations

Semantic Scholar estimates that this publication has 77 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Vazhkudai2002PredictingSG, title={Predicting Sporadic Grid Data Transfers}, author={Sudharshan S. Vazhkudai and Jennifer M. Schopf}, booktitle={HPDC}, year={2002} }