Handling data skew in join algorithms using MapReduce

  title={Handling data skew in join algorithms using MapReduce},
  author={Jaeseok Myung and Junho Shim and Jongheum Yeon and Sang-goo Lee},
  journal={Expert Syst. Appl.},
One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce’s basic hash-based partitioning method cannot solve the problem properly, two alternatives have been proposed: range-based and randomized methods. However, they still remain some drawbacks: the range-based method does not handle join product skew, and the randomized method performs worse than the basic hash-based partitioning when input relations are not skewed. In this paper, we present a… CONTINUE READING