An architecture for recycling intermediates in a column-store

@article{Ivanova2010AnAF,
  title={An architecture for recycling intermediates in a column-store},
  author={Milena Ivanova and Martin L. Kersten and Niels Nes and Romulo Goncalves},
  journal={ACM Trans. Database Syst.},
  year={2010},
  volume={35},
  pages={24:1-24:43}
}
Automatic recycling of intermediate results to improve both query response time and throughput is a grand challenge for state-of-the-art databases. Tuples are loaded and streamed through a tuple-at-a-time processing pipeline, avoiding materialization of intermediates as much as possible. This limits the opportunities for reuse of overlapping computations to DBA-defined materialized views and function/result cache tuning. In contrast, the operator-at-a-time execution paradigm produces fully… 
Big data availability: Selective partial checkpointing for in-memory database queries
TLDR
This work proposes a new approach for intra-query checkpointing that produces an optimal checkpoint solution for a fixed checkpointing budget to minimise overhead on in-memory column-oriented database clusters.
Explorer Recycling in pipelined query evaluation
TLDR
The novelty of this paper is to show how recycling can successfully be applied in pipelined query executors, by tracking the benefit of materializing possible intermediate results and then choosing the ones making best use of a limited intermediate result cache.
UvA-DARE ( Digital Academic Repository ) The DBMS-your Big Data
TLDR
A query processing paradigm and data storage model that are partial-loading aware are developed that can make a 1.2 TB dataset ready for querying in less than 3 minutes on a single server-class machine while maintaining good query processing performance.
Cache-Based Multi-Query Optimization for Data-Intensive Scalable Computing Frameworks
TLDR
A novel method combining in-memory cache primitives and multi-query optimization, to improve the efficiency of data-intensive, scalable computing frameworks and shows significant benefits of worksharing for both TPC-DS workloads and detailed micro-benchmarks.
A quantitative study of two matrix clustering algorithms
TLDR
A novel matrix clustering algorithm is considered that performs attribute replication during the branch and bound search and is compared with the best one of the earlier algorithms using both real and synthetic workloads.
Scalable and Efficient Analysis of Large High-Dimensional Data Sets in the Context of Recurrence Analysis
TLDR
This thesis introduces scalable recurrence analysis (SRA), which is an alternative computing approach that subdivides a recurrence matrix into multiple sub matrices and reduces the runtime for analysing time series exceeding one million data points from hours or days to minutes.
Selecting Subexpressions to Materialize at Datacenter Scale
TLDR
The problem of subexpression selection for large workloads, i.e., selecting common parts of job plans and materializing them to speed-up the evaluation of subsequent jobs is focused on and BigSubs, a vertex-centric graph algorithm is introduced to iteratively choose in parallel which subexpressions to materialize and which sub expressions to use for evaluating each job.
Computation Reuse in Analytics Job Service at Microsoft
TLDR
This paper describes a computation reuse framework, coined CLOUDVIEWS, which is built to address the computation overlap problem in Microsoft's SCOPE job service and presents a detailed analysis from the production workloads to motivate the computations overlap problem and the possible gains from computation reuse.
Database-inspired optimizations for statistical analysis
TLDR
This work proposes to vastly improve the execution of R scripts by interpreting them as a declaration of intent rather than an imperative order set in stone, which allows for optimization techniques from the columnar data management research field to be applied.
Many-query join: efficient shared execution of relational joins on modern hardware
TLDR
This paper proposes many-query join (MQJoin), a novel method for sharing the execution of a join that can efficiently deal with hundreds of concurrent queries by minimizing redundant work and making efficient use of main-memory bandwidth and multi-core architectures.
...
...

References

SHOWING 1-10 OF 30 REFERENCES
Flexible and efficient IR using array databases
TLDR
It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems.
Dynamic Materialized Views
TLDR
Experimental results in Microsoft SQL Server show that compared with conventional materialized views, dynamic materialization views greatly reduce storage requirements and maintenance costs while achieving better query performance with improved buffer pool efficiency.
Efficient exploitation of similar subexpressions for query processing
TLDR
This work introduces a light-weight and effective mechanism to detect potential sharing opportunities among expressions and presents the first comprehensive solution covering all aspects of the problem: detection, construction, and cost-based optimization.
Optimizing queries using materialized views: a practical, scalable solution
TLDR
A fast and scalable algorithm for determining whether part or all of a query can be computed from materialized views and how it can be incorporated in transformation-based optimizers is presented.
Breaking the memory wall in MonetDB
TLDR
This paper reports how research around the MonetDB database system has led to a redesign of database architecture in order to take advantage of modern hardware, and in particular to avoid hitting the memory wall.
Super-Scalar RAM-CPU Cache Compression
TLDR
This work proposes three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs and compares these algorithms with compression techniques used in (commercial) database and information retrieval systems.
Knapsack Problems: Algorithms and Computer Implementations
TLDR
This paper focuses on the part of the knapsack problem where the problem of bin packing is concerned and investigates the role of computer codes in the solution of this problem.
Dynamic Materialized View Management Based on Predicates
TLDR
This paper proposes a dynamic predicate-based partitioning approach, which can support a wide range of OLAP queries and conducts extensive performance studies using TPCH benchmark data on IBM DB2 and encouraging results are obtained which indicate that the approach is highly feasible.
Adaptive Database Caching with DBCache
Dynamic Materialization of Query Views for Data Warehouse Workloads
  • T. PhanWen-Syan Li
  • Computer Science
    2008 IEEE 24th International Conference on Data Engineering
  • 2008
TLDR
An automated, dynamic MQT management scheme that materializes views and creates indexes in an on-demand fashion as a workload executes and manages them with an LRU cache to maximize the benefit of executing queries with MQTs.
...
...