Instant Loading for Main Memory Databases

@article{Mhlbauer2013InstantLF,
  title={Instant Loading for Main Memory Databases},
  author={Tobias M{\"u}hlbauer and Wolf R{\"o}diger and Robert Seilbeck and Angelika Reiser and Alfons Kemper and Thomas Neumann},
  journal={Proc. VLDB Endow.},
  year={2013},
  volume={6},
  pages={1702-1713}
}
eScience and big data analytics applications are facing the challenge of efficiently evaluating complex queries over vast amounts of structured text data archived in network storage solutions. [...] Key Method With Instant Loading, we contribute a novel CSV loading approach that allows scalable bulk loading at wire speed. This is achieved by optimizing all phases of loading for modern super-scalar multi-core CPUs.Expand
In-Memory Big Data Management and Processing: A Survey
TLDR
This survey aims to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks.
Chapter Five - Comparative Study of Different In-Memory (No/New) SQL Databases
TLDR
In this chapter, an overview around in-memory databases with advanced processing, techniques, case studies are presented.
CIAO: An Optimization Framework for Client-Assisted Data Loading
TLDR
This paper presents CIAO, a tunable system to enable client cooperation with the server to enable efficient partial loading and data skipping for a given workload and proposed an efficient algorithm that would select a near-optimal predicate set to push down within a given budget.
SQL- and Operator-centric Data Analytics in Relational Main-Memory Databases
TLDR
It is shown that relational main-memory database systems are capable of executing analytical algorithms in a fully transactional environment while still exceeding performance of state-of-the-art analytical systems rendering the division of data management and data analytics unnecessary.
The DBMS - your big data sommelier
TLDR
A query processing paradigm and data storage model that are partial-loading aware is developed that can make a 1.2 TB dataset ready for querying in less than 3 minutes on a single server-class machine while maintaining good query processing performance.
Parallel in-situ data processing with speculative loading
TLDR
The results show that SCANRAW with speculative loading achieves optimal performance for a query sequence at any point in the processing, and maximizes resource utilization for the entire workload execution while speculatively loading data and without interfering with normal query processing.
SCANRAW: A Database Meta-Operator for Parallel In-Situ Processing and Loading
TLDR
This article proposes SCANRAW, a novel database meta-operator for in-situ processing over raw files that integrates data loading and external tables seamlessly, while preserving their advantages: optimal performance across a query workload and zero time-to-query.
Adaptive Query Processing on RAW Data
Database systems deliver impressive performance for large classes of workloads as the result of decades of research into optimizing database engines. High performance, however, is achieved at the
Main Memory Database Recovery
TLDR
This survey aims to provide a thorough review of in-memory database recovery techniques and discusses the recovery strategies of a representative sample of modern in- memory databases.
Distributed caching for processing raw arrays
TLDR
A distributed framework for cost-based caching of multi-dimensional arrays in native format is introduced and cache eviction and placement heuristic algorithms that consider the historical query workload are designed.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 35 REFERENCES
NoDB: efficient query execution on raw data files
TLDR
The design and roadmap of a new paradigm in database systems, called NoDB, which do not require data loading while still maintaining the whole feature set of a modern database system are designed and implemented, bringing an unprecedented positive effect in usability and performance.
H-store: a high-performance, distributed main memory transaction processing system
TLDR
The demonstration presented here provides insight on the development of a distributed main memory OLTP database and allows for the further study of the challenges inherent in this operating environment.
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
TLDR
This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots
TLDR
This work presents an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data.
Here are my Data Files. Here are my Queries. Where are my Results?
TLDR
To fully exploit DBMS features, the user must dene a schema, load the data, tune the system for the expected workload, and answer several questions, creating a formidable and time-consuming hurdle.
SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units
TLDR
This paper shows that utilizing the embedded Vector Processing Units (VPUs) found in standard superscalar processors can speed up the performance of mainmemory full table scan by factors without changing the hardware architecture and thereby without additional power consumption.
A comparison of approaches to large-scale data analysis
TLDR
A benchmark consisting of a collection of tasks that are run on an open source version of MR as well as on two parallel DBMSs shows a dramatic performance difference between the two paradigms.
The adaptive radix tree: ARTful indexing for main-memory databases
Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data
MonetDB/X100: Hyper-Pipelining Query Execution
TLDR
An in-depth investigation to the reason why database systems tend to achieve only low IPC on modern CPUs in compute-intensive application areas, and a new set of guidelines for designing a query processor for the MonetDB system that follows these guidelines.
DBMSs on a Modern Processor: Where Does Time Go?
TLDR
This paper examines four commercial DBMSs running on an Intel Xeon and NT 4.0 and introduces a framework for analyzing query execution time, and finds that database developers should not expect the overall execution time to decrease significantly without addressing stalls related to subtle implementation issues.
...
1
2
3
4
...