Skip to search formSkip to main content
You are currently offline. Some features of the site may not work correctly.

Apache Spark

Known as: Resilient Distributed Datasets, Resilient Distributed Dataset, Spark (cluster computing framework) 
Apache Spark is an open source cluster computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark… Expand
Wikipedia

Papers overview

Semantic Scholar uses AI to extract papers important to this topic.
Review
2019
Review
2019
Contrary to using distant and centralized cloud data center resources, employing decentralized resources at the edge of a network… Expand
  • figure 1
  • figure 3
  • figure 5
  • figure 4
  • figure 6
Review
2019
Review
2019
The k‐nearest neighbors algorithm is characterized as a simple yet effective data mining technique. The main drawback of this… Expand
Review
2018
Review
2018
The paper presents the details of designing and developing GeoSpark, which extends the core engine of Apache Spark and SparkSQL… Expand
  • figure 1
  • figure 2
  • figure 3
  • figure 4
  • figure 5
Review
2017
Review
2017
Today, a largely scalable computing environment provides a possibility of carrying out various data-intensive natural language… Expand
  • figure 1
  • figure 2
  • figure 3
  • figure 4
  • figure 5
Highly Cited
2016
Highly Cited
2016
Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning… Expand
  • figure 1
  • figure 2
Highly Cited
2016
Highly Cited
2016
This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications. 
  • figure 1
  • figure 4
  • figure 5
  • figure 6
  • figure 7
Highly Cited
2015
Highly Cited
2015
Spark SQL is a new module in Apache Spark that integrates relational processing with Spark's functional programming API. Built on… Expand
  • figure 1
  • figure 2
  • figure 3
  • figure 4
  • figure 5
Highly Cited
2015
Highly Cited
2015
the boom in the technology has resulted in emergence of new concepts and challenges. Big data is one of those spoke about terms… Expand
Highly Cited
2015
Highly Cited
2015
  • K. Wang, M. Khan
  • IEEE 17th International Conference on High…
  • 2015
  • Corpus ID: 16465129
Apache Spark is an open source distributed data processing platform that uses distributed memory abstraction to process large… Expand
  • figure 1
  • figure 2
  • figure 3
  • figure 7
  • figure 8
Highly Cited
2010
Highly Cited
2010
MapReduce and its variants have been highly successful in implementing large-scale data-intensive applications on commodity… Expand
  • figure 1
  • figure 2