A Performance Comparison of Open-Source Stream Processing Platforms

@article{Lopez2016APC,
  title={A Performance Comparison of Open-Source Stream Processing Platforms},
  author={Martin Andreoni Lopez and Antonio Gonzalez Pastana Lobato and Otto Carlos Muniz Bandeira Duarte},
  journal={2016 IEEE Global Communications Conference (GLOBECOM)},
  year={2016},
  pages={1-6}
}
Distributed stream processing platforms is a new class of real-time monitoring systems that analyze and extracts knowledge from large continuous streams of data. This type of systems is crucial for providing high throughput and low latency required by Big Data or Internet of Things monitoring applications. This paper describes and analyzes three main open-source distributed stream- processing platforms: Storm Flink, and Spark Streaming. We analyze the system architectures and we compare their… 
Benchmarking Distributed Stream Data Processing Systems
TLDR
This paper uses their suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink, and builds the first benchmarking framework to define and test the sustainable performance of streaming systems.
Benchmarking Distributed Stream Processing Engines
TLDR
This paper proposes a framework to evaluate the performance of three SDPSs, namely Apache Storm, Apache Spark, and Apache Flink, and highlights that there is no single winner, but rather, each system excels in individual use-cases.
DSPBench: A Suite of Benchmark Applications for Distributed Data Stream Processing Systems
TLDR
A new benchmark suite composed of 15 applications coming from areas like Finance, Telecommunications, Sensor Networks, Social Networks and others is presented, describing in detail the nature of these applications, their full workload characterization in terms of selectivity, processing cost, input size and overall memory occupation.
Experimental Study on the Performance and Resource Utilization of Data Streaming Frameworks
  • Subarna Chatterjee, C. Morin
  • Computer Science
    2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
  • 2018
TLDR
This work presents a comparative study of the “green-ness” of the streaming platforms by analyzing their power consumption – one of the first attempts of its kind.
Big Stream Processing Systems: An Experimental Evaluation
TLDR
An extensive experimental study of five popular systems in the real-time streaming data processing domain, namely, Apache Storm, Apache Flink, Apache Spark, Kafka Streams and Hazelcast Jet is presented.
Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing Systems
TLDR
A novel benchmark architecture is presented for comparing the performance impact of using Apache Beam on three streaming frameworks: Apache Spark Streaming, Apache Flink, and Apache Apex and finds significant performance penalties whenUsing Apache Beam for application development in the surveyed systems.
Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing Systems
TLDR
A novel benchmark architecture is presented for comparing the performance impact of using Apache Beam on three streaming frameworks: Apache Spark Streaming, Apache Flink, and Apache Apex and finds significant performance penalties whenUsing Apache Beam for application development in the surveyed systems.
Maximum Sustainable Throughput Evaluation Using an Adaptive Method for Stream Processing Platforms
TLDR
An adaptive MST evaluation method is proposed that adds a data-growth factor function to the naïve method cycle that dynamically and adaptively tunes the data rate for each data growth cycle and has a lower error rate and executes faster than the naïve evaluation method.
Measuring stream processing systems adaptability under dynamic workloads
TLDR
An index called AI-SPS inspired by the human cerebral auto-regulation process is proposed that quantifies the adaptation capacity of self-adaptive stream processing systems effectively and is validated by evaluating the adaptive behavior of two state of the art self- Adaptive streamprocessing systems.
Dragon: A Lightweight, High Performance Distributed Stream Processing Engine
TLDR
Dragon is a good "allrounder" solution and is particularly suitable for Edge computing applications, given its small installation footprint, and competitive in performance to Storm and Heron.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 24 REFERENCES
Comparing Distributed Online Stream Processing Systems Considering Fault Tolerance Issues
TLDR
This paper presents an analysis of four online stream processing systems (MillWheel, S4, Spark Streaming and Storm) regarding the strategies they use for fault tolerance and discusses the advantages and disadvantages of the combination of the strategies for faultolerance.
The 8 requirements of real-time stream processing
TLDR
Eight requirements that a system software should meet to excel at a variety of real-time stream processing applications are outlined to provide high-level guidance to information technologists so that they will know what to look for when evaluation alternative stream processing solutions.
Stream Bench: Towards Benchmarking Modern Distributed Stream Computing Frameworks
TLDR
This paper raises the benchmark definition Stream Bench regarding the requirements, proposes a message system functioning as a mediator between stream data generation and consumption, and applies it to two popular frameworks, Apache Storm and Apache Spark Streaming.
Of Streams and Storms
The past few years have witnessed an unparalleled surge in both structured and unstructured data being generated by heterogeneous sources. These sources vary from scientific computations and sensor
Lightweight Asynchronous Snapshots for Distributed Dataflows
TLDR
This work proposes Asynchronous Barrier Snapshotting (ABS), a lightweight algorithm suited for modern dataflow execution engines that minimises space requirements and persists only operator states on acyclic execution topologies while keeping a minimal record log on cyclic dataflows.
Discretized streams: fault-tolerant streaming computation at scale
TLDR
D-Streams enable a parallel recovery mechanism that improves efficiency over traditional replication and backup schemes, and tolerates stragglers, and can easily be composed with batch and interactive query models like MapReduce, enabling rich applications that combine these modes.
Scheduling Decisions in Stream Processing on Heterogeneous Clusters
  • M. Rychlý, P. Škoda, P. Smrz
  • Computer Science
    2014 Eighth International Conference on Complex, Intelligent and Software Intensive Systems
  • 2014
TLDR
A proposal of a novel scheduler for stream processing frameworks on heterogeneous clusters is presented, which employs design-time knowledge as well as benchmarking techniques to achieve optimal resource-aware deployment of applications over the clusters and eventually better overall utilization of the cluster.
S4: Distributed Stream Computing Platform
TLDR
The architecture resembles the Actors model, providing semantics of encapsulation and location transparency, thus allowing applications to be massively concurrent while exposing a simple programming interface to application developers.
A Performance Analysis of System S, S4, and Esper via Two Level Benchmarking
TLDR
This paper compares and contrasts performance characteristics of three stream processing softwares System S, S4, and Esper to construct 70 different application scenarios and observes that S4's architectural aspect which instantiates a Processing Element for each keyed attribute is less efficient compared to the fixed number of PEs used by System S and Esper.
Online stream processing of machine-to-machine communications traffic: A platform comparison
TLDR
The results show that, by using DSPS services, the implementations of a DSPS-based data analysis application on top of either the well-known Storm DSPS or the Quasit middleware are able to largely meet the real-time processing requirements of the use-case scenario.
...
1
2
3
...