• Corpus ID: 227239208

LifeStream: A High-performance Stream Processing Engine for Waveform Data

  title={LifeStream: A High-performance Stream Processing Engine for Waveform Data},
  author={Anand Jayarajan and Kimberly Hau and Andrew Goodwin and Gennady Pekhimenko},
Hospitals around the world collect massive amount of physiological data from their patients every day. Recently, there has been increasing research interest to subject this data into statistical analysis for gaining more insights and providing improved medical diagnoses. Enabling such advancements in healthcare require efficient data processing systems. In this paper, we show that currently available data processing solutions either fail to meet the performance requirements or lack simple and… 
How to validate Machine Learning Models Prior to Deployment: Silent trial protocol for evaluation of real-time models at ICU
A silent trial protocol for evaluating models in real-time in the ICU setting following principles of formative testing and gathering information that can be used to refine the model to best fit within the intended environment of deployment is introduced.


A practical approach to storage and retrieval of high frequency physiological signals.
A novel time series storage solution specifically targeted at physiological waveforms and other associated clinical and medical device data, designed to serve as a data source for high performance computing systems and provides an Application Programming Interface for functional, rapid data retrieval.
TerseCades: Efficient Data Compression in Stream Processing
This work has demonstrated that compression can be effective for stream processing, both in the ability to process in larger windows and in throughput, through a series of optimizations on a stream engine itself to remove major sources of inefficiency.
Dynamically Scaling Apache Storm for the Analysis of Streaming Data
This paper describes the design and implementation of a tool that monitors several aspects of the Storm platform, the applications running on top of it, and external systems such as queues and databases, and decides whether extra servers are needed or machines may be decommissioned from the cluster.
Naiad: a timely dataflow system
It is shown that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining.
StreamBox: Modern Stream Processing on a Multicore Machine
A novel stream processing engine called StreamBox is presented that exploits the parallelism and memory hierarchy of modern multicore hardware and introduces a data structure called cascading containers, which dynamically manages concurrency and dependences among epochs in the transform pipeline.
Weld : A Common Runtime for High Performance Data Analytics
Weld is proposed, a runtime for data-intensive applications that optimizes across disjoint libraries and functions that uses a common intermediate representation to capture the structure of diverse dataparallel workloads, including SQL, machine learning and graph analytics.
Structured Streaming: A Declarative API for Real-Time Applications in Apache Spark
Structured Streaming is a new high-level streaming API in Apache Spark based on the experience with Spark Streaming that achieves high performance via Spark SQL's code generation engine and can outperform Apache Flink by up to 2x and Apache Kafka Streams by 90x.
StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory
The design and implementation of StreamBox-HBM is presented, a stream analytics engine that exploits hybrid memories to achieve scalable high performance and is the first stream engine optimized for hybrid memories.
The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
One such approach is presented, the Dataflow Model, along with a detailed examination of the semantics it enables, an overview of the core principles that guided its design, and a validation of the model itself via the real-world experiences that led to its development.