• Corpus ID: 3075716

Evolving Databases for New-Gen Big Data Applications

@inproceedings{Barber2017EvolvingDF,
  title={Evolving Databases for New-Gen Big Data Applications},
  author={Ronald Barber and Christian Garcia-Arellano and Ronen Grosman and Ren{\'e} M{\"u}ller and Vijayshankar Raman and Richard Sidle and Matt Spilchen and Adam J. Storm and Yuanyuan Tian and Pinar T{\"o}z{\"u}n and Daniel C. Zilio and Matthew Huras and Guy M. Lohman and Chandrasekaran Mohan and Fatma {\"O}zcan and Hamid Pirahesh},
  booktitle={CIDR},
  year={2017}
}
The rising popularity of large-scale real-time analytics applications (real-time inventory/pricing, mobile apps that give you suggestions, fraud detection, risk analysis, etc.) emphasize the need for distributed data management systems that can handle fast transactions and analytics concurrently. Efficient processing of transactional and analytical requests, however, require different optimizations and architectural decisions in a system. This paper presents the Wildfire system, which targets… 

Figures from this paper

Hybrid Transactional/Analytical Processing: A Survey
TLDR
This tutorial is to quickly review the historical progression of OLTP and OLAP systems, discuss the driving factors for HTAP, and provide a deep technical analysis of existing and emerging HTAP solutions, detailing their key architectural differences and trade-offs.
Operationalizing Analytics with NewSQL
TLDR
This paper aims to provide a structured look into the features and capabilities offered by NewSQL systems that can be leveraged to allow Data Analysis over a variety of data types and an overview of Realtime Analytics offerings, Map Reduce capabilities and hybrid (transactional and analytical) features.
Greenplum: A Hybrid Database for Transactional and Analytical Workloads
TLDR
This paper augments Greenplum into a hybrid system to serve both OLTP and OLAP workloads with the capability to separate OLAP and OLTP workloads into more suitable query processing mode and proposes a global deadlock detector to increase the concurrency of query processing.
Hybrid Transactional and Analytical Processing Databases: A Systematic Literature Review
TLDR
This paper provides a comprehensive summary of these implementations, giving an overview of the last decade of research on the emerging sector of HTAP Processing databases and discussing fundamental involved technologies.
L-Store: A Real-time OLTP and OLAP System
TLDR
This paper presents Lineage-based Data Store (L-Store), a real-time processing of transactional and analytical workloads within a single unified engine by introducing a novel lineage-based storage architecture that demonstrates its superiority compared to state-of-the-art approaches under a comprehensive experimental evaluation.
Components and Development in Big Data System: A Survey
TLDR
The components and modern optimization technologies developed for Big Data, which helps to choose the most suitable components and architecture from various Big Data technologies based on requirements are presented.
How Good is My HTAP System?
TLDR
A new concept called throughput frontier is introduced, which visualizes both transactional and analytical throughput in a single 2D graph and defines a freshness metric which quantifies how recent is the snapshot of the data seen by each analytical query in a HTAP system.
F1 lightning
TLDR
The design and experiences of F1 Lightning, a system built and deployed to meet the challenge of supporting both new and legacy applications that demand transparent fast queries and transactions from this combination, are reported on.
CirroData: Yet Another SQL-on-Hadoop Data Analytics Engine with High Performance
This paper presents CirroData, a high-performance SQL-on-Hadoop system designed for Big Data analytics workloads. As a home-grown enterprise-level online analytical processing (OLAP) system with more
Fault Tolerant Data Stream Processing in Cooperation with OLTP Engine
TLDR
The main focus is to develop new data stream processing methodologies such as fault tolerance in cooperation with the OLAP engine, as part of a new project to develop data streamprocessing technologies in the HTAP environment in cooperation in Japan.
...
...

References

SHOWING 1-10 OF 24 REFERENCES
Wildfire: Concurrent Blazing Data Ingest and Analytics
TLDR
A simplified mobile application uses Wildfire to recommend advertising to mobile customers based upon their distance from stores and their interest in products sold by these stores, while continuously graphing analytics results as those customers move and respond to the ads with purchases.
The SAP HANA Database -- An Architecture Overview
TLDR
This paper highlights the architectural concepts employed in the SAP HANA database and reports on insights gathered with the SAPHANA database in real-world enterprise application scenarios.
HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots
TLDR
This work presents an efficient hybrid system, called HyPer, that can handle both OLTP and OLAP simultaneously by using hardware-assisted replication mechanisms to maintain consistent snapshots of the transactional data.
Tutorial: SQL-on-Hadoop Systems
TLDR
The term SQL-on-Hadoop is used to refer to systems that provide some level of declarative SQL(-like) processing over HDFS and noSQL data sources, using architectures that include computational or storage engines compatible with Apache Hadoop.
Hive - a petabyte scale data warehouse using Hadoop
TLDR
Hive is presented, an open-source data warehousing solution built on top of Hadoop that supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs that are executed using Hadoops.
HAWQ: a massively parallel processing SQL engine in hadoop
TLDR
The novel design of HAWQ is presented, including query processing, the scalable software interconnect based on UDP protocol, transaction management, fault tolerance, read optimized storage, the extensible framework for supporting various popular Hadoop based data stores and formats, and various optimization choices the authors considered to enhance the query performance.
MonetDB/X100: Hyper-Pipelining Query Execution
TLDR
An in-depth investigation to the reason why database systems tend to achieve only low IPC on modern CPUs in compute-intensive application areas, and a new set of guidelines for designing a query processor for the MonetDB system that follows these guidelines.
The log-structured merge-tree (LSM-tree)
TLDR
The log-structured mergetree (LSM-tree) is a disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts (and deletes) over an extended period.
Impala: A Modern, Open-Source SQL Engine for Hadoop
TLDR
This paper presents Impala from a user’s perspective, gives an overview of its architecture and main components and briefly demonstrates its superior performance compared against other popular SQL-on-Hadoop systems.
Megastore: Providing Scalable, Highly Available Storage for Interactive Services
TLDR
Megastore provides fully serializable ACID semantics within ne-grained partitions of data, which allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters.
...
...