High-performance design of apache spark with RDMA and its benefits on various workloads

@article{Lu2016HighperformanceDO,
  title={High-performance design of apache spark with RDMA and its benefits on various workloads},
  author={Xiaoyi Lu and Dipti Shankar and Shashank Gugnani and Dhabaleswar K. Panda},
  journal={2016 IEEE International Conference on Big Data (Big Data)},
  year={2016},
  pages={253-262}
}
The in-memory data processing framework, Apache Spark, has been stealing the limelight for low-latency interactive applications, iterative and batch computations. Our early experience study [17] has shown that Apache Spark can be enhanced to leverage advanced features (e.g., RDMA) on highperformance networks (e.g., InfiniBand and RoCE) to improve the performance of shuffle phase. With the fast evolving of the Apache Spark ecosystem, the Spark architecture has been changing a lot. This motivates… CONTINUE READING