• Publications
  • Influence
The Gamma Database Machine Project
TLDR
The design of the Gamma database machine and the techniques employed in its implementation are described and a thorough performance evaluation of the iPSC/s hypercube version of Gamma is presented.
Practical Skew Handling in Parallel Joins
TLDR
This work developed, implemented, and experimented with four new skew-handling parallel join algorithms, one of which, which is called virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash join was theclear winner in lower skew or no skew cases.
LH*—a scalable, distributed data structure
TLDR
It is shown that LH* files can efficiently scale to files that are orders of magnitude larger in size than single-site files, and can be more efficient than any distributed file with a centralized directory, or a static parallel or distributed hash file.
A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment
TLDR
The Hybrid hash-join algorithm is found to be superior except when the join attribute values of the inner relation are non-uniformly distributed and memory is limited.
LH: Linear Hashing for distributed files
TLDR
LH* generalizes Linear Hashing to parallel or distributed RAM and disk files, and can be much faster than a single site disk file, and/or can hold a much larger number of objects.
Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines
TLDR
This paper examines the problem of processing multi-way join queries through hash-based join methods in a shared-nothing database machine and demonstrates that right-deep scheduling strategies can provide significant performance advantages in large multiprocessor database machines under many circumstances, even when memory is limited.
RP*: A Family of Order Preserving Scalable Distributed Data Structures
TLDR
A family of ordered SDDSs, called RP*, is proposed, providing for ordered and dynamic files on multicomputers, and thus for more efficient processing of range queries and of ordered traversals of files.
Practical selectivity estimation through adaptive sampling
TLDR
This paper extends the previous analysis to provide significantly improved bounds on the amount of sampling necessary for a given level of accuracy and provides “sanity bounds” to deal with queries for which the underlying data is extremely skewed or the query result is very small.
An Evaluation of Non-Equijoin Algorithms
TLDR
A comparison between the partitioned band ,join algorithm and the classical sort-merge join algorit and data from speedup and scalcup experiments demonstrating that the partitioning hand join is efficiently paral-efficient are presented.
Parallel sorting on a shared-nothing architecture using probabilistic splitting
TLDR
The authors consider the problem of external sorting in a shared-nothing multiprocessor with two techniques for determining ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by Iyer, Ricard, and Varman; and probabilistic splitting, which uses sampling to estimate quantiles.
...
1
2
3
4
...