An architecture for recycling intermediates in a column-store

@article{Ivanova2010AnAF,
  title={An architecture for recycling intermediates in a column-store},
  author={Milena Ivanova and Martin L. Kersten and Niels Nes and Romulo Goncalves},
  journal={ACM Trans. Database Syst.},
  year={2010},
  volume={35},
  pages={24:1-24:43}
}
Automatic recycling of intermediate results to improve both query response time and throughput is a grand challenge for state-of-the-art databases. Tuples are loaded and streamed through a tuple-at-a-time processing pipeline, avoiding materialization of intermediates as much as possible. This limits the opportunities for reuse of overlapping computations to DBA-defined materialized views and function/result cache tuning. In contrast, the operator-at-a-time execution paradigm produces fully… Expand
Big data availability: Selective partial checkpointing for in-memory database queries
Fault tolerance is an important challenge for supporting critical big data analytic operations. Most existing solutions only provide fault tolerant data replication, requiring failed queries to beExpand
Explorer Recycling in pipelined query evaluation
Database systems typically execute queries in isolation. Sharing recurring intermediate and final results between successive query invocations is ignored or only exploited by caching final queryExpand
UvA-DARE ( Digital Academic Repository ) The DBMS-your Big Data
When addressing the problem of “big” data volume, preparation costs are one of the key challenges: the high costs for loading, aggregating and indexing data leads to a long data-toinsight time. InExpand
Cache-Based Multi-Query Optimization for Data-Intensive Scalable Computing Frameworks
TLDR
A novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data-intensive, scalable computing frameworks and shows significant benefits of worksharing for both TPC-DS workloads and detailed micro-benchmarks. Expand
A quantitative study of two matrix clustering algorithms
TLDR
A novel matrix clustering algorithm is considered that performs attribute replication during the branch and bound search and is compared with the best one of the earlier algorithms using both real and synthetic workloads. Expand
Computation Reuse in Analytics Job Service at Microsoft
TLDR
This paper describes a computation reuse framework, coined CLOUDVIEWS, which is built to address the computation overlap problem in Microsoft's SCOPE job service and presents a detailed analysis from the production workloads to motivate the computations overlap problem and the possible gains from computation reuse. Expand
Database-inspired optimizations for statistical analysis
TLDR
This work proposes to vastly improve the execution of R scripts by interpreting them as a declaration of intent rather than an imperative order set in stone, which allows for optimization techniques from the columnar data management research field to be applied. Expand
Scalable and Efficient Analysis of Large High-Dimensional Data Sets in the Context of Recurrence Analysis
TLDR
This thesis introduces scalable recurrence analysis (SRA), which is an alternative computing approach that subdivides a recurrence matrix into multiple sub matrices and reduces the runtime for analysing time series exceeding one million data points from hours or days to minutes. Expand
Selecting Subexpressions to Materialize at Datacenter Scale
TLDR
The problem of subexpression selection for large workloads, i.e., selecting common parts of job plans and materializing them to speed-up the evaluation of subsequent jobs is focused on and BigSubs, a vertex-centric graph algorithm is introduced to iteratively choose in parallel which subexpressions to materialize and which sub expressions to use for evaluating each job. Expand
A study of PosDB Performance in a Distributed Environment
PosDB is a new disk-based distributed column-store relational engine aimed for research purposes. It uses the Volcano pull-based model and late materialization for query processing, and join indexesExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 30 REFERENCES
Dynamic Materialized Views
TLDR
Experimental results in Microsoft SQL Server show that compared with conventional materialized views, dynamic materialization views greatly reduce storage requirements and maintenance costs while achieving better query performance with improved buffer pool efficiency. Expand
Efficient exploitation of similar subexpressions for query processing
TLDR
This work introduces a light-weight and effective mechanism to detect potential sharing opportunities among expressions and presents the first comprehensive solution covering all aspects of the problem: detection, construction, and cost-based optimization. Expand
Flexible and efficient IR using array databases
TLDR
It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. Expand
Optimizing queries using materialized views: a practical, scalable solution
TLDR
A fast and scalable algorithm for determining whether part or all of a query can be computed from materialized views and how it can be incorporated in transformation-based optimizers is presented. Expand
Breaking the memory wall in MonetDB
TLDR
This paper reports how research around the MonetDB database system has led to a redesign of database architecture in order to take advantage of modern hardware, and in particular to avoid hitting the memory wall. Expand
Super-Scalar RAM-CPU Cache Compression
TLDR
This work proposes three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs and compares these algorithms with compression techniques used in (commercial) database and information retrieval systems. Expand
An architecture for recycling intermediates in a column-store
TLDR
This paper studies an architecture that harvests the by-products of the operator-at-a-time paradigm in a column store system using a lightweight mechanism, the recycler, and indicates the potentials of recycling intermediates and charters a route for further development of database kernels. Expand
Content-based filtering for efficient online materialized view maintenance
TLDR
It is shown that the content-based method can catch most (or all) irrelevant updates to base relations that are missed by the traditional method, and the load on the RDBMS due to materialized view maintenance can be significantly reduced. Expand
Dynamic Materialization of Query Views for Data Warehouse Workloads
  • T. Phan, Wen-Syan Li
  • Computer Science
  • 2008 IEEE 24th International Conference on Data Engineering
  • 2008
TLDR
An automated, dynamic MQT management scheme that materializes views and creates indexes in an on-demand fashion as a workload executes and manages them with an LRU cache to maximize the benefit of executing queries with MQTs. Expand
Self-organizing strategies for a column-store database
TLDR
This work presents two workload-driven self-organizing techniques in a column-store, i.e. adaptive segmentation and adaptive replication, which can significantly improve system performance as demonstrated in an evaluation of different scenarios. Expand
...
1
2
3
...