• Publications
  • Influence
The Design of the Borealis Stream Processing Engine
TLDR
This paper outlines the basic design and functionality of Borealis, and presents a highly flexible and scalable QoS-based optimization model that operates across server and sensor networks and a new fault-tolerance model with flexible consistency-availability trade-offs.
C-Store: A Column-oriented DBMS
TLDR
Preliminary performance data on a subset of TPC-H is presented and it is shown that the system the team is building, C-Store, is substantially faster than popular commercial products.
A comparison of approaches to large-scale data analysis
TLDR
A benchmark consisting of a collection of tasks that are run on an open source version of MR as well as on two parallel DBMSs shows a dramatic performance difference between the two paradigms.
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
TLDR
This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
H-store: a high-performance, distributed main memory transaction processing system
TLDR
The demonstration presented here provides insight on the development of a distributed main memory OLTP database and allows for the further study of the challenges inherent in this operating environment.
High-availability algorithms for distributed stream processing
TLDR
The design and algorithmic challenges associated with the proposed recovery techniques are discussed and how each can provide different guarantees with proper combinations of redundant processing, checkpointing, and remote logging are described.
MapReduce and parallel DBMSs: friends or foes?
MapReduce complements DBMSs since databases are not designed for extract-transform-load tasks, a MapReduce specialty.
Aurora: a data stream management system
TLDR
This work proposes to demonstrate the Aurora system with its development environment and runtime system, with several example monitoring applications developed in consultation with defense, financial, and natural science communities, and shows the effect of various system alternatives on various workloads.
Correlation Maps: A Compressed Access Method for Exploiting Soft Functional Dependencies
TLDR
It is shown that in a real application (SDSS) and widely used benchmark (TPC-H), there exist many cases of attribute correlation that can be exploited to accelerate queries and a tool that can automatically suggest useful pairs of correlated attributes is discussed.
...
...