S4: Distributed Stream Computing Platform

@article{Neumeyer2010S4DS,
  title={S4: Distributed Stream Computing Platform},
  author={Leonardo Neumeyer and Bruce Robbins and Anish Nair and Anand Kesari},
  journal={2010 IEEE International Conference on Data Mining Workshops},
  year={2010},
  pages={170-177}
}
S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. [] Key Result We show that the S4 design is surprisingly flexible and lends itself to run in large clusters built with commodity hardware.

Figures from this paper

Data-centric Programming for Distributed Systems
TLDR
This thesis presents an attempt to avert this crisis by rethinking both the languages used to implement distributed systems and the analyses and tools used to understand them, and develops Bloom, which provides powerful analysis capabilities that identify when distributed programs produce deterministic outcomes despite widespread nondeterminism in their executions.
Streaming data analytics via message passing with application to graph algorithms
Enorm: efficient window-based computation in large-scale distributed stream processing systems
TLDR
A new framework is proposed, which is designed to expose sufficient semantic information of the applications to enable the aforementioned effective optimizations, while on the other hand, maintaining the flexibility of Storm's original programming framework.
Fault-Tolerant Streaming Computation with BlockMon
TLDR
This paper presents the performance of the distributed stream-processing platform Blockmon, with the novel fault-tolerant mechanism that it is implemented on top, and compares it against Spark, the state-of-the art in terms of fault-Tolerant stream- processing platform.
AJIRA: A Lightweight Distributed Middleware for MapReduce and Stream Processing
TLDR
The evaluation shows that AJIRA is competitive in a wide range of scenarios both in terms of processing time and scalability, making it an ideal choice where flexibility, extensibility, and the processing of both large and dynamic data with a single programming model are either desirable or even mandatory requirements.
Large-Scale Data Stream Processing Systems
TLDR
This chapter introduces the major design aspects of large scale data stream processing systems, covering programming model abstraction levels and runtime concerns, and presents a detailed case study on stateful stream processing with Apache Flink, an open-source stream processor used for a wide variety of processing tasks.
Esc: Towards an Elastic Stream Computing Platform for the Cloud
TLDR
ES is a new stream computing engine designed for computations with real-time demands, such as online data mining, that offers a simple programming model in which programs are specified by directed acyclic graphs (DAGs).
Discretized streams: fault-tolerant streaming computation at scale
TLDR
D-Streams enable a parallel recovery mechanism that improves efficiency over traditional replication and backup schemes, and tolerates stragglers, and can easily be composed with batch and interactive query models like MapReduce, enabling rich applications that combine these modes.
Designing Twister 2 : Efficient Programming Environment Toolkit for Big Data
TLDR
This paper study existing systems, candidate event-driven runtimes, the design choices they have made for each component, and how this affects the type of applications they can support, and proposes a loosely coupled component-based approach for designing a big data toolkit where each component can have different implementations to support various applications.
TimeStream: reliable stream computation in the cloud
TLDR
This work advocates a powerful new abstraction called resilient substitution that caters to the specific needs in this new computation model to handle failure recovery and dynamic reconfiguration in response to load changes.
...
...

References

SHOWING 1-10 OF 17 REFERENCES
SPC: a distributed, scalable platform for data mining
TLDR
The SPC programming model is described, which is to the best of the authors' knowledge, the first to support stream-mining applications using a subscription-like model for specifying stream connections as well as to provide support for non-relational operators.
ZooKeeper: Wait-free Coordination for Internet-scale Systems
TLDR
ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all requests that change the ZooKeeper state to enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers.
ACTORS - a model of concurrent computation in distributed systems
  • G. Agha
  • Computer Science
    MIT Press series in artificial intelligence
  • 1985
TLDR
A foundational model of concurrency is developed and issues in the design of parallel systems and why the actor model is suitable for exploiting large-scale parallelism are addressed.
The 8 requirements of real-time stream processing
TLDR
Eight requirements that a system software should meet to excel at a variety of real-time stream processing applications are outlined to provide high-level guidance to information technologists so that they will know what to look for when evaluation alternative stream processing solutions.
MapReduce Online
TLDR
A modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed, and can reduce completion times and improve system utilization for batch jobs as well.
Actor frameworks for the JVM platform: a comparative analysis
TLDR
This paper analyzes some of the more significant efforts to build actor-oriented frameworks for the JVM platform in terms of their execution semantics, the communication and synchronization abstractions provided, and the representations used in the implementations.
The power of events - an introduction to complex event processing in distributed enterprise systems
TLDR
Some possible long-term future roles of CEP in the Information Society are discussed along with the need to develop rule-based event hierarchies on a commercial basis to make those applications possible.
Open-source Projects
TLDR
With the introduction of the World Wide Web (www) in 1993, a new range of protocols, such as the hypertext transfer protocol (http), had emerged, which subsequently became standards, and allowed for the easy specification of a path to resources on the World wide Web.
A Simplex Method for Function Minimization
A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of
...
...