This paper is published by ACM and requires payment for access.

R-MESHJOIN for near-real-time data warehousing

Abstract

To fulfill the increasing demand of business for the latest information, current data integration approaches are moving towards real-time updates. One important element in real-time data integration is the join of a continuous incoming data stream with a disk-based relation. In this paper we investigate a stream-based join algorithm, called mesh join (MESHJOIN), and propose an improved version called reduced MESHJOIN (R-MESHJOIN). Both algorithms tune the memory, allocating parts of the memory to key components. In MESHJOIN there is a dependency between the size of partitions in an internal queue for the stream data and the number of iterations required to bring the disk-based relation into memory. This dependency hampers the optimal distribution of memory among the join components. In particular the size of the disk-buffer varies with the size of the disk-based relation which is unnecessary. On the other hand the R-MESHJOIN algorithm removes this dependency. This enables an optimal distribution of available memory among the join components. In R-MESHJOIN a change in the size of the disk-based relation does not affect the size of the disk-buffer. An experimental study is conducted in order to validate the arguments.

DOI: 10.1145/1871940.1871952

Extracted Key Phrases

10 Figures and Tables

Unfortunately, ACM prohibits us from displaying non-influential references for this paper.

To see the full reference list, please visit http://dl.acm.org/citation.cfm?id=1871952.