Abstract

The ability to access several datasets concurrently on heterogeneous storage devices is becoming increasingly important for data-intensive applications, including database, data mining, and data visualization systems. The basic problem faced by applications is that while datasets can reside on a variety of storage systems such as secondary, tertiary, and network storage, the CPU can only operate on memory-resident data. Practical solutions are required to allow applications to move datasets from heterogeneous storage devices into memory and back to the devices while maximizing data transfer e ciency and minimizing the amount of time the CPU waits for I/O. A key factor in achieving high data transfer e ciency is to exploit I/O concurrency. The continually increasing performance gap between CPUs and storage devices has made it imperative for the computer system to perform data transfers on several storage devices concurrently. Operating systems have traditionally attempted to increase I/O concurrency and reduce the amount of time the CPU waits for I/O by overlapping the CPU processing of one application with the I/Os of another (inter-application I/O concurrency). An alternative is to overlap the CPU processing of a single application with its own data transfers (intra-application I/O concurrency). Current trends in computing technology are making the latter approach increasingly important. The transition of processing from the mainframe to the desktop, changes in processor and storage device technologies, and the emergence of the World Wide Web are all contributing to the shift in the nature of I/O concurrency. The goal of the research presented in this thesis is to explore data bu ering mechanisms and policies which allow applications to exploit I/O concurrency on heterogeneous storage devices. The central theme of the research is application-driven I/O concurrency. The rst part of this thesis focuses on the performance characteristics of storage devices and I/O interfaces. Next, we describe a bu er management system which allows applications to access data on heterogeneous devices e ciently, concurrently, and uniformly. In the third part of the thesis, we examine three applications which use the bu er manager: the DEVise data visualization system, a le sorting application, and a relational join.

44 Figures and Tables

Cite this paper

@inproceedings{Myllymaki1997ConcurrentDS, title={Concurrent Data Streams}, author={Jussi Myllymaki and David J. DeWitt and Raghu Ramakrishnan and Yannis E. Ioannidis and Rafael Lazimy}, year={1997} }