Hybrid Row-Column Partitioning in Teradata

  title={Hybrid Row-Column Partitioning in Teradata},
  author={Mohammed Al-Kateb and Paul Sinclair and Grace Au and Carrie Ballinger},
  journal={Proc. VLDB Endow.},
Data partitioning is an indispensable ingredient of database systems due to the performance improvement it can bring to any given mixed workload. Data can be partitioned horizontally or vertically. While some commercial proprietary and open source database systems have one flavor or mixed flavors of these partitioning forms, Teradata Database offers a unique hybrid row-column store solution that seamlessly combines both of these partitioning schemes. The key feature of this hybrid solution is… 

A Hybrid Partitioning Strategy for NewSQL Databases: The VoltDB Case

This paper proposes a hybrid partitioning approach for NewSQL databases that allows the user to define the vertical and horizontal data partitions and proposes a hash function that considers schema information and data access statistics.

Optimizing UNION ALL Join Queries in Teradata

This paper proposes cost-based pushing of joins into branches of UNION ALL to address the prohibitive cost of spooling all branches, and helps in exposing more efficient join methods which, otherwise, would not be considered by the query optimizer.

NewSQL Through the Looking Glass

The main features of the most prominent NewSQL products are discussed, besides benchmarking results for analyzing their performance are presented and it is believed that both analysis can be useful as a guide to a future choice of NewSQL technologies.

Row-Store / Column-Store / Hybrid-Store

Three of the most widely used main memory database system layouts available today are row store, column store and hybrid store. In this paper, their similarities and differences regarding their

A High-Performance Distributed Relational Database System for Scalable OLAP Processing

This work presents HRDBMS, a fully implemented distributed shared-nothing relational database developed with the goal of improving the scalability of OLAP queries that achieves high scalability through a principled combination of techniques from relational and big data systems with novel communication and work-distribution techniques.

A Simple Semantic-Based Data Storage Layout for Querying Point Clouds

A simple data layout that makes use the semantics of semantic point cloud data and that allows for quick queries is presented and the obtained query results suggest that the presented approach can be successfully used to handle point and range queries on large points clouds.

Real-time analytics, hybrid transactional/analytical processing, in-memory data management, and non-volatile memory

It is concluded that an emergence of new generation of NVM will greatly stimulate its use in in-memory HTAP systems, and whether these systems use non-volatile memory, and, if yes, in what manner.

Application of Dynamic Fragmentation Methods in Multimedia Databases: A Review

An in-depth review of the literature related to dynamic fragmentation of multimedia databases is provided, to identify the main challenges, technologies employed, types of fragmentation used, and characteristics of the cost model.

An Autonomous Hybrid Data Partition for NewSQL DBs

This thesis proposes an automated approach for hybrid data partitioning that automatically reorganizes data based on the current workload of the NewSQL DBs that combines the high scalability and availability with the ACID support.

Enterprise-wide Machine Learning using Teradata Vantage: An Integrated Analytics Platform

The proposed TD Vantage is outlined and its capabilities are demonstrated through three proofs of concept biz: image data using TensorFlow, text data using Spark, and transaction data using Aster, with Teradata orchestrating interactions among the various components.



C-Store: A Column-oriented DBMS

Preliminary performance data on a subset of TPC-H is presented and it is shown that the system the team is building, C-Store, is substantially faster than popular commercial products.

Partitioning Key Selection for a Shared-nothing Parallel Database System

This study shows that by following a systematic methodology, especially for the partitioning key selection and associated relation grouping issues, the entire data placement strategy for a given database schema and workload can be determined in a very eecient manner.

Locality-aware Partitioning in Parallel Database Systems

This paper presents a novel partitioning scheme called predicate-based reference partition (or PREF) that allows to co-partition sets of tables based on given join predicates that helps to effectively reduce the runtime of queries under a given workload when compared to existing partitioning approaches.

Integrating compression and execution in column-oriented database systems

This paper shows how compression schemes not traditionally used in row-oriented DBMSs can be applied to column-oriented systems and evaluates a set of compression schemes and shows that the best scheme depends not only on the properties of the data but also on the nature of the query workload.

Optimizing queries over partitioned tables in MPP systems

This paper presents a concise and unified representation for partitioned tables and devise optimization techniques to generate query plans that can defer decisions on accessing certain partitions to query run-time and demonstrates, the resulting query plans distinctly outperform conventional query plans in a variety of scenarios.

Query optimization techniques for partitioned tables

This work develops new techniques to generate efficient plans for SQL queries involving multiway joins over partitioned tables with low optimization overhead, designed for easy incorporation into bottom-up query optimizers that are in wide use today.

The SAP HANA Database -- An Architecture Overview

This paper highlights the architectural concepts employed in the SAP HANA database and reports on insights gathered with the SAPHANA database in real-world enterprise application scenarios.

Column oriented Database Systems

This tutorial presents an overview of column-oriented database system technology and addresses questions about how easily a major row-based system achieve column-store performance and the new applications that can be potentially enabled by column-stores.

Automating physical database design in a parallel database

This work seeks to automate the process of data partitioning in a shared-nothing parallel database system by using the query optimizer itself both to recommend candidate partitions for each table that will benefit each query in the workload, and to evaluate various combinations of these candidates.

Integrating vertical and horizontal partitioning into automated physical database design

This paper presents novel techniques for designing a scalable solution to this integrated physical design problem that takes both performance and manageability into account and implements it on Microsoft SQL Server.