Instant Loading for Main Memory Databases
@article{Mhlbauer2013InstantLF, title={Instant Loading for Main Memory Databases}, author={Tobias M{\"u}hlbauer and Wolf R{\"o}diger and Robert Seilbeck and Angelika Reiser and Alfons Kemper and Thomas Neumann}, journal={Proc. VLDB Endow.}, year={2013}, volume={6}, pages={1702-1713} }
eScience and big data analytics applications are facing the challenge of efficiently evaluating complex queries over vast amounts of structured text data archived in network storage solutions. [] Key Method With Instant Loading, we contribute a novel CSV loading approach that allows scalable bulk loading at wire speed. This is achieved by optimizing all phases of loading for modern super-scalar multi-core CPUs.
Figures from this paper
78 Citations
In-Memory Big Data Management and Processing: A Survey
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2015
This survey aims to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks.
Chapter Five - Comparative Study of Different In-Memory (No/New) SQL Databases
- Computer ScienceAdv. Comput.
- 2018
CIAO: An Optimization Framework for Client-Assisted Data Loading
- Computer Science2021 IEEE 37th International Conference on Data Engineering (ICDE)
- 2021
This paper presents CIAO, a tunable system to enable client cooperation with the server to enable efficient partial loading and data skipping for a given workload and proposed an efficient algorithm that would select a near-optimal predicate set to push down within a given budget.
SQL- and Operator-centric Data Analytics in Relational Main-Memory Databases
- Computer ScienceEDBT
- 2017
It is shown that relational main-memory database systems are capable of executing analytical algorithms in a fully transactional environment while still exceeding performance of state-of-the-art analytical systems rendering the division of data management and data analytics unnecessary.
The DBMS - your big data sommelier
- Computer Science2015 IEEE 31st International Conference on Data Engineering
- 2015
A query processing paradigm and data storage model that are partial-loading aware is developed that can make a 1.2 TB dataset ready for querying in less than 3 minutes on a single server-class machine while maintaining good query processing performance.
Parallel in-situ data processing with speculative loading
- Computer ScienceSIGMOD Conference
- 2014
The results show that SCANRAW with speculative loading achieves optimal performance for a query sequence at any point in the processing, and maximizes resource utilization for the entire workload execution while speculatively loading data and without interfering with normal query processing.
SCANRAW: A Database Meta-Operator for Parallel In-Situ Processing and Loading
- Computer ScienceTODS
- 2015
This article proposes SCANRAW, a novel database meta-operator for in-situ processing over raw files that integrates data loading and external tables seamlessly, while preserving their advantages: optimal performance across a query workload and zero time-to-query.
Adaptive Query Processing on RAW Data
- Computer ScienceProc. VLDB Endow.
- 2014
Database systems deliver impressive performance for large classes of workloads as the result of decades of research into optimizing database engines. High performance, however, is achieved at the…
Main Memory Database Recovery
- Computer ScienceACM Comput. Surv.
- 2021
This survey aims to provide a thorough review of in-memory database recovery techniques and discusses the recovery strategies of a representative sample of modern in- memory databases.
Distributed caching for processing raw arrays
- Computer ScienceSSDBM
- 2018
A distributed framework for cost-based caching of multi-dimensional arrays in native format is introduced and cache eviction and placement heuristic algorithms that consider the historical query workload are designed.
References
SHOWING 1-10 OF 36 REFERENCES
NoDB: efficient query execution on raw data files
- Computer ScienceSIGMOD Conference
- 2012
The design and roadmap of a new paradigm in database systems, called NoDB, which do not require data loading while still maintaining the whole feature set of a modern database system are designed and implemented, bringing an unprecedented positive effect in usability and performance.
H-store: a high-performance, distributed main memory transaction processing system
- Computer ScienceProc. VLDB Endow.
- 2008
The demonstration presented here provides insight on the development of a distributed main memory OLTP database and allows for the further study of the challenges inherent in this operating environment.
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
- Computer ScienceProc. VLDB Endow.
- 2009
This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots
- Computer Science2011 IEEE 27th International Conference on Data Engineering
- 2011
This work presents an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data.
Here are my Data Files. Here are my Queries. Where are my Results?
- Computer ScienceCIDR
- 2011
To fully exploit DBMS features, the user must dene a schema, load the data, tune the system for the expected workload, and answer several questions, creating a formidable and time-consuming hurdle.
SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units
- Computer ScienceProc. VLDB Endow.
- 2009
This paper shows that utilizing the embedded Vector Processing Units (VPUs) found in standard superscalar processors can speed up the performance of mainmemory full table scan by factors without changing the hardware architecture and thereby without additional power consumption.
A comparison of approaches to large-scale data analysis
- Computer ScienceSIGMOD Conference
- 2009
A benchmark consisting of a collection of tasks that are run on an open source version of MR as well as on two parallel DBMSs shows a dramatic performance difference between the two paradigms.
The adaptive radix tree: ARTful indexing for main-memory databases
- Computer Science2013 IEEE 29th International Conference on Data Engineering (ICDE)
- 2013
Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data…
MonetDB/X100: Hyper-Pipelining Query Execution
- Computer ScienceCIDR
- 2005
An in-depth investigation to the reason why database systems tend to achieve only low IPC on modern CPUs in compute-intensive application areas, and a new set of guidelines for designing a query processor for the MonetDB system that follows these guidelines.
DBMSs on a Modern Processor: Where Does Time Go?
- Computer ScienceVLDB
- 1999
This paper examines four commercial DBMSs running on an Intel Xeon and NT 4.0 and introduces a framework for analyzing query execution time, and finds that database developers should not expect the overall execution time to decrease significantly without addressing stalls related to subtle implementation issues.