Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming

@article{Chintapalli2016BenchmarkingSC,
  title={Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming},
  author={Sanket Chintapalli and D. Dagit and B. Evans and R. Farivar and Thomas Graves and Mark Holderbaugh and Zhuo Liu and Kyle Nusbaum and Kishorkumar Patil and Boyang Peng and Paul Poulosky},
  journal={2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)},
  year={2016},
  pages={1789-1792}
}
Streaming data processing has been gaining attention due to its application into a wide range of scenarios. To serve the booming demands of streaming data processing, many computation engines have been developed. However, there is still a lack of real-world benchmarks that would be helpful when choosing the most appropriate platform for serving real-time streaming needs. In order to address this problem, we developed a streaming benchmark for three representative computation engines: Flink… Expand
Evaluation of Stream Processing Frameworks
The increasing need for real-time insights in data sparked the development of multiple stream processing frameworks. Several benchmarking studies were conducted in an effort to form guidelines forExpand
Evaluation of Stream Processing Frameworks
TLDR
The relationship between latency, throughput, and resource consumption, and the performance impact of adding different common operations to the pipeline is analyzed and the results show that the latency disadvantages of using a micro-batch system are most apparent for stateless operations. Expand
Benchmarking Distributed Stream Processing Engines
TLDR
This paper proposes a framework to evaluate the performance of three SDPSs, namely Apache Storm, Apache Spark, and Apache Flink, and highlights that there is no single winner, but rather, each system excels in individual use-cases. Expand
Benchmarking Distributed Stream Data Processing Systems
TLDR
This paper uses their suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink, and builds the first benchmarking framework to define and test the sustainable performance of streaming systems. Expand
Characterization of Big Data Stream Processing Pipeline: A Case Study using Flink and Kafka
TLDR
This work profiles the performance of a big data streaming pipeline using Apache Flink as the SPE and Apache Kafka as the intermediate message queue and finds the intermediate stages of the stream pipeline to be a significant contributor to the overall latency of the system. Expand
ESPBench: The Enterprise Stream Processing Benchmark
TLDR
This work employs ESPBench on three state-of-the-art stream processing systems, Apache Spark, Apache Flink, and Hazelcast Jet, and highlights the need for the provided ESPBench toolkit that supports benchmark execution, as it enables query result validation and objective latency measures. Expand
Apache Spark Streaming and HarmonicIO: A Performance and Architecture Comparison
TLDR
A performance benchmark comparison between Apache Spark Streaming (ASS) under both file and TCP streaming modes; and HarmonicIO, comparing maximum throughput over a broad domain of message sizes and CPU loads is presented. Expand
Benchmarking Distributed Stream Processing Frameworks for Real Time Classical Machine Learning Applications
TLDR
This work includes benchmarking these popular frameworks for their applicability to classical machine learning models: Online K-Means, Online Linear Regression and Online Logistic Regression to help system designers choose the right model and the right framework, given a specific configuration on streaming data. Expand
Exploratory Analysis of Spark Structured Streaming
TLDR
This work focuses on the new Spark Structured Streaming and analyses it by diving into its internal functionalities and shows that it is able to run multiple queries successfully in parallel on data with changing velocity and volume sizes. Expand
Apache Spark Streaming, Kafka and HarmonicIO: A Performance Benchmark and Architecture Comparison for Enterprise and Scientific Computing
TLDR
A benchmark of stream processing throughput comparing Apache Spark Streaming, with a prototype P2P stream processing framework, HarmonicIO, is presented, suggesting which frameworks and streaming sources are likely to offer good performance for a given load. Expand
...
1
2
3
4
5
...

References

SHOWING 1-9 OF 9 REFERENCES
S4: Distributed Stream Computing Platform
TLDR
The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers. Expand
Twitter Heron: Stream Processing at Scale
TLDR
Heron is now the de facto stream data processing engine inside Twitter, and in this paper the design and implementation of this new system, called Heron are presented and the experiences from running Heron in production are shared. Expand
Spark: Cluster Computing with Working Sets
TLDR
Spark can outperform Hadoop by 10x in iterative machine learning jobs, and can be used to interactively query a 39 GB dataset with sub-second response time. Expand
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
TLDR
One such approach is presented, the Dataflow Model, along with a detailed examination of the semantics it enables, an overview of the core principles that guided its design, and a validation of the model itself via the real-world experiences that led to its development. Expand
Apache Apex Project
  • Apache Apex Project
High-throughput, low-latency, and exactly-once stream processing with Apache Flink http://data-artisans.com/high-throughput-low-latency- and-exactly-once-stream-processing-with-apache-flink
  • High-throughput, low-latency, and exactly-once stream processing with Apache Flink http://data-artisans.com/high-throughput-low-latency- and-exactly-once-stream-processing-with-apache-flink
Streaming Big Data: Storm, Spark and Samza.. " https://tsicilian.wordpress.com/2015/02/16/streaming-big-data-storm- spark-and-samza
  • Streaming Big Data: Storm, Spark and Samza.. " https://tsicilian.wordpress.com/2015/02/16/streaming-big-data-storm- spark-and-samza
Which Stream Processing Engine: Storm, Spark, Samza, or Flink?Which-Stream- Processing-Engine-Storm-Spark-Samza-or-Flink
  • Which Stream Processing Engine: Storm, Spark, Samza, or Flink?Which-Stream- Processing-Engine-Storm-Spark-Samza-or-Flink
Yahoo Streaming Benchmark https://github.com/yahoo/streaming- benchmarks
  • Yahoo Streaming Benchmark https://github.com/yahoo/streaming- benchmarks