Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction

@inproceedings{Kocsis2016AutomaticIO,
  title={Automatic Improvement of Apache Spark Queries using Semantics-preserving Program Reduction},
  author={Zoltan A. Kocsis and John H. Drake and Douglas Carson and Jerry Swan},
  booktitle={GECCO},
  year={2016}
}
Apache Spark is a popular framework for large-scale data analytics. Unfortunately, Spark's performance can be difficult to optimise, since queries freely expressed in source code are not amenable to traditional optimisation techniques. This article describes Hylas, a tool for automatically optimising Spark queries embedded in source code via the application… CONTINUE READING