MonetDB/DataCell: Online Analytics in a Streaming Column-Store

@article{Liarou2012MonetDBDataCellOA,
  title={MonetDB/DataCell: Online Analytics in a Streaming Column-Store},
  author={Erietta Liarou and Stratos Idreos and Stefan Manegold and Martin L. Kersten},
  journal={Proc. VLDB Endow.},
  year={2012},
  volume={5},
  pages={1910-1913}
}
In DataCell, we design streaming functionalities in a modern relational database kernel which targets big data analytics. This includes exploitation of both its storage/execution engine and its optimizer infrastructure. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages for modern applications in need for online analytics such as web logs, network monitoring and scientific data management. The major challenge then… Expand
Enhanced stream processing in a DBMS kernel
TLDR
This paper focuses on incremental window-based processing, arguably the most crucial streamspecific requirement, and designs a stream engine on top of an existing relational database kernel, in order to maintain and reuse the generic storage and execution model of the DBMS. Expand
Database support for processing complex aggregate queries over data streams
TLDR
The goal of this thesis is to investigate the potential of combining database systems with SPEs in the context of stream processing so as to improve the overall query evaluation performance. Expand
SnappyData : Streaming , Transactions , and Interactive Analytics in a Unified Engine
In recent years, our customers have expressed frustration in the traditional approach of using a combination of disparate products to handle their streaming, transactional and analytical needs. TheExpand
SnappyData: A Unified Cluster for Streaming, Transactions and Interactice Analytics
TLDR
SnappyData is presented as the first unified engine capable of delivering analytics, transactions, and stream processing in a single integrated cluster by carefully marrying a big data computational engine with a scale-out transactional store. Expand
ENTRADA: A high-performance network traffic data streaming warehouse
We present ENTRADA, a high-performance data streaming warehouse that enables researchers and operators to analyze vast amounts of network traffic and measurement data within interactive responseExpand
SnappyData: A Hybrid Transactional Analytical Store Built On Spark
TLDR
This work proposes a unified engine for real-time operational analytics, delivering stream analytics, OLTP and OLAP in a single integrated solution through a seamless integration of Apache Spark (as a big data computational engine) with GemFire (as an in-memory transactional store with scale-out SQL semantics). Expand
DBStream: An online aggregation, filtering and processing system for network traffic monitoring
TLDR
DBStream is introduced, a novel online traffic monitoring system based on the DSW paradigm, which allows fast and flexible analysis across multiple heterogeneous data sources, and provides a novel stream processing language for implementing data processing modules, as well as aggregation, filtering, and storage capabilities for further data analysis. Expand
Large-scale network traffic monitoring with DBStream, a system for rolling big data analysis
TLDR
DBStream is described, which is an SQL-based system that explicitly supports incremental queries for rolling data analysis and is presented a performance comparison of DBStream with a parallel data processing engine (Spark), showing that, in some scenarios, a single DBStream node can outperform a cluster of ten Spark nodes on rolling network monitoring workloads. Expand
DBStream: A holistic approach to large-scale network traffic monitoring and analysis
TLDR
DBStream is presented, a holistic approach to large-scale network monitoring and analysis applications and its Continuous Execution Language (CEL) can be used to automate several data processing and analysis tasks typical for monitoring operational ISP networks. Expand
A thin monitoring layer for top-k aggregation queries over a database
TLDR
The proposed family of maintenance algorithms further exploits the relations between the monitored rankings known from multi-query optimisation, and presents results of a preliminary experimental evaluation using TPC-H data to study the performance of the algorithms. Expand
...
1
2
...

References

SHOWING 1-10 OF 22 REFERENCES
Experience in Extending Query Engine for Continuous Analytics
TLDR
A new kind of tightly integrated, highly efficient system with the advanced stream processing capability as well as the full DBMS functionality is resulted, which can significantly reduce the engineering investment needed for developing the streaming technology. Expand
Continuous Analytics: Rethinking Query Processing in a Network-Effect World
TLDR
This paper describes the Continuous Analytics approach and outlines some of the key technical arguments behind it, creating a powerful and flexible system that can run SQL over tables, streams, and combinations of the two. Expand
Exploiting the power of relational databases for efficient stream processing
TLDR
A complete architecture is proposed, the DataCell, which is implemented on top of an open-source column-oriented DBMS, which allows batch processing of tuples and selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions. Expand
TelegraphCQ: continuous dataflow processing
TLDR
The current version of TelegraphCQ is shown, which is implemented by leveraging the code base of the open source PostgreSQL database system, which found that a significant portion of the PostgreSQL code was easily reusable. Expand
Operator scheduling in data stream systems
TLDR
The aim is to design a scheduling strategy that minimizes the maximum runtime system memory while maintaining the output latency within prespecified bounds, and presents Chain scheduling, an operator scheduling strategy for data stream systems that is near-optimal in minimizing runtime memory usage. Expand
Algorithms and metrics for processing multiple heterogeneous continuous queries
TLDR
This article examines the problem of how to schedule multiple Continuous Queries in a DSMS to optimize different Quality of Service (QoS) metrics, and proposes a hybrid scheduling policy that strikes a fine balance between performance and fairness. Expand
The Case for a Signal-Oriented Data Stream Management System
TLDR
This paper motivates the need for a data management and continuous query processing architecture that integrates two different desired classes of functions into a single, unified software system. Expand
IBM infosphere streams for scalable, real-time, intelligent transportation services
TLDR
A prototype system that generates dynamic, multi-faceted views of transportation information for the city of Stockholm, using real vehicle GPS and road-network data is described and the use of IBM InfoSphere Streams, a scalable stream processing platform, is demonstrated. Expand
Self-organizing tuple reconstruction in column-stores
TLDR
A novel design, partial sideways cracking, is proposed that achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself, and brings significant performance benefits for multi-attribute queries. Expand
NiagaraCQ: a scalable continuous query system for Internet databases
Continuous queries are persistent queries that allow users to receive new results when they become available. While continuous query systems can transform a passive web into an active environment,Expand
...
1
2
3
...