Scaling Spark in the Real World: Performance and Usability


Apache Spark is one of the most widely used open source processing engines for big data, with rich language-integrated APIs and a wide range of libraries. Over the past two years, our group has worked to deploy Spark to a wide range of organizations through consulting relationships as well as our hosted service, Databricks. We describe the main challenges… (More)


4 Figures and Tables


Citations per Year

Citation Velocity: 45

Averaging 45 citations per year over the last 3 years.

Learn more about how we calculate this metric in our FAQ.