R-MESHJOIN for near-real-time data warehousing

Abstract

To fulfill the increasing demand of business for the latest information, current data integration approaches are moving towards real-time updates. One important element in real-time data integration is the join of a continuous incoming data stream with a disk-based relation. In this paper we investigate a stream-based join algorithm, called mesh join (MESHJOIN), and propose an improved version called reduced MESHJOIN (R-MESHJOIN). Both algorithms tune the memory, allocating parts of the memory to key components. In MESHJOIN there is a dependency between the size of partitions in an internal queue for the stream data and the number of iterations required to bring the disk-based relation into memory. This dependency hampers the optimal distribution of memory among the join components. In particular the size of the disk-buffer varies with the size of the disk-based relation which is unnecessary. On the other hand the R-MESHJOIN algorithm removes this dependency. This enables an optimal distribution of available memory among the join components. In R-MESHJOIN a change in the size of the disk-based relation does not affect the size of the disk-buffer. An experimental study is conducted in order to validate the arguments.

DOI: 10.1145/1871940.1871952

Extracted Key Phrases

11 Figures and Tables

Cite this paper

@inproceedings{Naeem2010RMESHJOINFN, title={R-MESHJOIN for near-real-time data warehousing}, author={M. Asif Naeem and Gillian Dobbie and Gerald Weber and Shafiq Alam}, booktitle={DOLAP}, year={2010} }