Handling data skew in join algorithms using MapReduce

@article{Myung2016HandlingDS,
  title={Handling data skew in join algorithms using MapReduce},
  author={Jaeseok Myung and Junho Shim and Jongheum Yeon and Sang-goo Lee},
  journal={Expert Syst. Appl.},
  year={2016},
  volume={51},
  pages={286-299}
}
One of the major obstacles hindering effective join processing on MapReduce is data skew. Since MapReduce’s basic hash-based partitioning method cannot solve the problem properly, two alternatives have been proposed: range-based and randomized methods. However, they still remain some drawbacks: the range-based method does not handle join product skew, and the randomized method performs worse than the basic hash-based partitioning when input relations are not skewed. In this paper, we present a… CONTINUE READING