Skip to search formSkip to main content
You are currently offline. Some features of the site may not work correctly.

Apache Spark

Known as: Resilient Distributed Datasets, Resilient Distributed Dataset, Spark (cluster computing framework) 
Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark… Expand
Wikipedia

Papers overview

Semantic Scholar uses AI to extract papers important to this topic.
Highly Cited
2016
Highly Cited
2016
Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning… Expand
  • figure 1
  • figure 2
Is this relevant?
Highly Cited
2016
Highly Cited
2016
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications. 
  • figure 1
  • figure 4
  • figure 5
  • figure 6
  • figure 7
Is this relevant?
Review
2016
Review
2016
Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper… Expand
  • figure 1
  • figure 2
  • figure 3
  • table 1
  • figure 4
Is this relevant?
Highly Cited
2016
Highly Cited
2016
We describe matrix computations available in the cluster programming framework, Apache Spark. Out of the box, Spark provides… Expand
  • table 1
  • figure 1
  • figure 2
Is this relevant?
Highly Cited
2015
Highly Cited
2015
Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API. Built on… Expand
  • figure 1
  • figure 2
  • figure 3
  • figure 4
  • figure 5
Is this relevant?
Highly Cited
2015
Highly Cited
2015
Apache Spark is an open source distributed data processing platform that uses distributed memory abstraction to process large… Expand
  • figure 1
  • figure 2
  • figure 3
  • figure 7
  • figure 8
Is this relevant?
Highly Cited
2015
Highly Cited
2015
the boom in the technology has resulted in emergence of new concepts and challenges. Big data is one of those spoke about terms… Expand
Is this relevant?
Highly Cited
2015
Highly Cited
2015
Data has long been the topic of fascination for Computer Science enthusiasts around the world, and has gained even more… Expand
  • figure 3
  • figure 2
  • figure 3
  • table 1
  • table 2
Is this relevant?
Highly Cited
2015
Highly Cited
2015
Apache Spark is one of the most widely used open source processing engines for big data, with rich language-integrated APIs and a… Expand
  • figure 1
  • figure 4
  • figure 2
  • figure 3
Is this relevant?
Highly Cited
2010
Highly Cited
2010
MapReduce and its variants have been highly successful in implementing large-scale data-intensive applications on commodity… Expand
  • figure 1
  • figure 2
Is this relevant?