MonetDB/DataCell: Online Analytics in a Streaming Column-Store
@article{Liarou2012MonetDBDataCellOA, title={MonetDB/DataCell: Online Analytics in a Streaming Column-Store}, author={Erietta Liarou and Stratos Idreos and Stefan Manegold and Martin L. Kersten}, journal={Proc. VLDB Endow.}, year={2012}, volume={5}, pages={1910-1913} }
In DataCell, we design streaming functionalities in a modern relational database kernel which targets big data analytics. This includes exploitation of both its storage/execution engine and its optimizer infrastructure. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages for modern applications in need for online analytics such as web logs, network monitoring and scientific data management. The major challenge then…
21 Citations
Enhanced stream processing in a DBMS kernel
- Computer ScienceEDBT '13
- 2013
This paper focuses on incremental window-based processing, arguably the most crucial streamspecific requirement, and designs a stream engine on top of an existing relational database kernel, in order to maintain and reuse the generic storage and execution model of the DBMS.
Database support for processing complex aggregate queries over data streams
- Computer ScienceEDBT '13
- 2013
The goal of this thesis is to investigate the potential of combining database systems with SPEs in the context of stream processing so as to improve the overall query evaluation performance.
SnappyData : Streaming , Transactions , and Interactive Analytics in a Unified Engine
- Computer Science
- 2016
SnappyData is the first to offer end users an intuitive means for expressing their accuracy requirements without overwhelming them with statistical concepts, through a novel concept of high-level accuracy contracts (HAC).
SnappyData: A Unified Cluster for Streaming, Transactions and Interactice Analytics
- Computer ScienceCIDR
- 2017
SnappyData is presented as the first unified engine capable of delivering analytics, transactions, and stream processing in a single integrated cluster by carefully marrying a big data computational engine with a scale-out transactional store.
ENTRADA: A high-performance network traffic data streaming warehouse
- Computer ScienceNOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium
- 2016
We present ENTRADA, a high-performance data streaming warehouse that enables researchers and operators to analyze vast amounts of network traffic and measurement data within interactive response…
SnappyData: A Hybrid Transactional Analytical Store Built On Spark
- Computer ScienceSIGMOD Conference
- 2016
This work proposes a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics).
DBStream: An online aggregation, filtering and processing system for network traffic monitoring
- Computer Science2014 International Wireless Communications and Mobile Computing Conference (IWCMC)
- 2014
DBStream is introduced, a novel online traffic monitoring system based on the DSW paradigm, which allows fast and flexible analysis across multiple heterogeneous data sources, and provides a novel stream processing language for implementing data processing modules, as well as aggregation, filtering, and storage capabilities for further data analysis.
Large-scale network traffic monitoring with DBStream, a system for rolling big data analysis
- Computer Science2014 IEEE International Conference on Big Data (Big Data)
- 2014
DBStream is described, which is an SQL-based system that explicitly supports incremental queries for rolling data analysis and is presented a performance comparison of DBStream with a parallel data processing engine (Spark), showing that, in some scenarios, a single DBStream node can outperform a cluster of ten Spark nodes on rolling network monitoring workloads.
DBStream: A holistic approach to large-scale network traffic monitoring and analysis
- Computer ScienceComput. Networks
- 2016
A thin monitoring layer for top-k aggregation queries over a database
- Computer ScienceDBRank '13
- 2013
The proposed family of maintenance algorithms further exploits the relations between the monitored rankings known from multi-query optimisation, and presents results of a preliminary experimental evaluation using TPC-H data to study the performance of the algorithms.
References
SHOWING 1-10 OF 22 REFERENCES
Experience in Extending Query Engine for Continuous Analytics
- Computer ScienceDaWak
- 2010
A new kind of tightly integrated, highly efficient system with the advanced stream processing capability as well as the full DBMS functionality is resulted, which can significantly reduce the engineering investment needed for developing the streaming technology.
Continuous Analytics: Rethinking Query Processing in a Network-Effect World
- Computer ScienceCIDR
- 2009
This paper describes the Continuous Analytics approach and outlines some of the key technical arguments behind it, creating a powerful and flexible system that can run SQL over tables, streams, and combinations of the two.
Exploiting the power of relational databases for efficient stream processing
- Computer ScienceEDBT '09
- 2009
A complete architecture is proposed, the DataCell, which is implemented on top of an open-source column-oriented DBMS, which allows batch processing of tuples and selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions.
TelegraphCQ: continuous dataflow processing
- Computer ScienceSIGMOD '03
- 2003
The current version of TelegraphCQ is shown, which is implemented by leveraging the code base of the open source PostgreSQL database system, which found that a significant portion of the PostgreSQL code was easily reusable.
Operator scheduling in data stream systems
- Computer ScienceThe VLDB Journal
- 2004
The aim is to design a scheduling strategy that minimizes the maximum runtime system memory while maintaining the output latency within prespecified bounds, and presents Chain scheduling, an operator scheduling strategy for data stream systems that is near-optimal in minimizing runtime memory usage.
Algorithms and metrics for processing multiple heterogeneous continuous queries
- Computer ScienceTODS
- 2008
This article examines the problem of how to schedule multiple Continuous Queries in a DSMS to optimize different Quality of Service (QoS) metrics, and proposes a hybrid scheduling policy that strikes a fine balance between performance and fairness.
The Case for a Signal-Oriented Data Stream Management System
- Computer ScienceCIDR
- 2007
This paper motivates the need for a data management and continuous query processing architecture that integrates two different desired classes of functions into a single, unified software system.
IBM infosphere streams for scalable, real-time, intelligent transportation services
- Computer ScienceSIGMOD Conference
- 2010
A prototype system that generates dynamic, multi-faceted views of transportation information for the city of Stockholm, using real vehicle GPS and road-network data is described and the use of IBM InfoSphere Streams, a scalable stream processing platform, is demonstrated.
Self-organizing tuple reconstruction in column-stores
- Computer ScienceSIGMOD Conference
- 2009
A novel design, partial sideways cracking, is proposed that achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself, and brings significant performance benefits for multi-attribute queries.
NiagaraCQ: a scalable continuous query system for Internet databases
- Computer ScienceSIGMOD '00
- 2000
The design of NiagaraCQ is presented, some experimental results on the system's performance and scalability are given and other techniques including incremental evaluation of continuous queries, use of both pull and push models for detecting heterogeneous data source changes, and memory caching are employed.