Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions

Abstract

Large multimedia document archives hold most of their data in near-line tertiary storage libraries for cost reasons. This paper develops an integrated approach to the vertical data migration between the tertiary and secondary storage in that it reconciles speculative preloading, to mask the high latency of the tertiary storage, with the replacement policy of the secondary storage. In addition, it considers the interaction of these policies with the tertiary storage scheduling and controls preloading aggressiveness by taking contention for tertiary storage drives into account. The integrated migration policy is based on a continuous-time Markov-chain (CTMC) model for predicting the expected number of accesses to a document within a specified time horizon. The parameters of the CTMC model, the probabilities of co-accessing certain documents and the interaction times between successive accesses, are dynamically estimated and adjusted to evolving workload patterns by keeping online statistics. The integrated policy for vertical data migration has been implemented in a prototype system. Detailed simulation studies with Web-server-like synthetic workloads indicate significant gains in terms of client response time. The studies also show that the overhead of the statistical bookkeeping and the computations for the access predictions is affordable.

Extracted Key Phrases

5 Figures and Tables

Cite this paper

@inproceedings{Kraiss1997VerticalDM, title={Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions}, author={Achim Kraiss and Gerhard Weikum}, booktitle={VLDB}, year={1997} }