Stateful Scalable Stream Processing at LinkedIn

@article{Noghabi2017StatefulSS,
  title={Stateful Scalable Stream Processing at LinkedIn},
  author={Shadi A. Noghabi and Kartik Paramasivam and Yi Pan and Navina Ramesh and Jon Bringhurst and Indranil Gupta and Roy H. Campbell},
  journal={PVLDB},
  year={2017},
  volume={10},
  pages={1634-1645}
}
Distributed stream processing systems need to support stateful processing, recover quickly from failures to resume such processing, and reprocess an entire data stream quickly. We present Apache Samza, a distributed system for stateful and fault-tolerant stream processing. Samza utilizes a partitioned local state along with a low-overhead background changelog mechanism, allowing it to scale to massive state sizes (hundreds of TB) per application. Recovery from failures is sped up by re… CONTINUE READING
Highly Cited
This paper has 19 citations. REVIEW CITATIONS