Optimizing Data Partitioning for Data-Parallel Computing

  title={Optimizing Data Partitioning for Data-Parallel Computing},
  author={Qifa Ke and Vijayan Prabhakaran and Yinglian Xie and Yuan Yu and Jingyue Wu and Junfeng Yang},
Performance of data-parallel computing (e.g., MapReduce, DryadLINQ) heavily depends on its data partitions. Solutions implemented by the current state of the art systems are far from optimal. Techniques proposed by the database community to find optimal data partitions are not directly applicable when complex user-defined functions and data models are… CONTINUE READING