Corpus ID: 13664494

RHEEMix in the Data Jungle - A Cross-Platform Query Optimizer -

@article{Kruse2018RHEEMixIT,
  title={RHEEMix in the Data Jungle - A Cross-Platform Query Optimizer -},
  author={Sebastian Kruse and Zoi Kaoudi and Jorge-Arnulfo Quian{\'e}-Ruiz and Sanjay Chawla and Felix Naumann and Bertty Contreras},
  journal={ArXiv},
  year={2018},
  volume={abs/1805.03533}
}
In pursuit of efficient and scalable data analytics, the insight that "one size does not fit all" has given rise to a plethora of specialized data processing platforms and today's complex data analytics are moving beyond the limits of a single platform. To cope with these new requirements, we present a cross-platform optimizer that allocates the subtasks of data analytic tasks to the most suitable platforms. Our main contributions are: (i)~a mechanism based on graph transformations to explore… Expand
RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! -
TLDR
Rheem is presented, a general-purpose cross-platform data processing system that decouples applications from the underlying platforms and allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Expand
RHEEM: Enabling Cross-Platform Data Processing
Solving business problems increasingly requires going beyond the limits of a single data processing platform (platform for short), such as Hadoop or a DBMS. As a result, organizations typicallyExpand
Optimizing Cross-Platform Data Movement
TLDR
This paper model the data movement problem as a new graph problem, which is proved to be NP-hard, and propose a novel graph exploration algorithm, which allows Rheem to discover multiple hidden opportunities for cross-platform data processing. Expand
Building your Cross-Platform Application with RHEEM
TLDR
Rheem is a general-purpose cross-platform data processing system that alleviates users from the pain of finding the most efficient data processing platform for a given task and allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Expand
Parallel query processing in a polystore
TLDR
This paper addresses polystore issues by using the polyglot approach of the CloudMdsQL query language that allows native queries to be expressed as inline scripts and combined with SQL statements for ad-hoc integration, thus allowing for native scripts to be processed in parallel at data store shards. Expand
Polystore++: Accelerated Polystore System for Heterogeneous Workloads
TLDR
Polystore++ is envisioned, an architecture to accelerate existing polystore systems using hardware accelerators (e.g., FPGAs, CGRAs, and GPUs) and can achieve high performance at low power by identifying and offloading components of a polystore system that are amenable to acceleration using specialized hardware. Expand
ML-based Cross-Platform Query Optimization
TLDR
The evaluation shows that the vector-based approach is more efficient and scalable than simply using an ML model and Robopt matches and, in some cases, improves Rheem’s cost-based optimizer in choosing good plans without requiring any tuning effort. Expand

References

SHOWING 1-10 OF 52 REFERENCES
RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! -
TLDR
Rheem is presented, a general-purpose cross-platform data processing system that decouples applications from the underlying platforms and allows users to focus on the business logic of their applications rather than on the mechanics of how to compose and execute them. Expand
RHEEM: Enabling Cross-Platform Data Processing
Solving business problems increasingly requires going beyond the limits of a single data processing platform (platform for short), such as Hadoop or a DBMS. As a result, organizations typicallyExpand
Mix ‘n’ match multi-engine analytics
TLDR
IReS, the Intelligent Resource Scheduler for complex analytics workflows executed over multi-engine environments is presented, able to optimize a workflow with respect to a user-defined policy relying on cost and performance models of the required tasks over the available platforms. Expand
Musketeer: all for one, one for all in data processing systems
TLDR
Musketeer is built, a workflow manager which can dynamically map front-end workflow descriptions to a broad range of back-end execution engines and speeds up realistic workflows by up to 9x by targeting different execution engines, without requiring any manual effort. Expand
SOFA: An extensible logical optimizer for UDF-heavy data flows
TLDR
S ofa is a novel and extensible optimizer for U df -heavy data flows, which builds on a concise set of properties for describing the semantics of Map/Reduce-style U df s and a small set of rewrite rules to find a much larger number of semantically equivalent plan rewrites than possible with traditional techniques. Expand
Optimizing analytic data flows for multiple execution engines
TLDR
This paper focuses on optimizing flows for a single objective, namely performance, over multiple execution engines that span a DBMS, a Map-Reduce engine, and an orchestration engine (e.g., NoSQL plus SQL). Expand
Road to Freedom in Big Data Analytics
TLDR
RHEEM provides a threelayer data processing and storage abstraction to achieve both platform independence and interoperability across multiple platforms, and presents a data cleaning application built using some of the ideas of RHEEM. Expand
The Stratosphere platform for big data analytics
TLDR
The overall system architecture design decisions are presented, Stratosphere is introduced through example queries, and the internal workings of the system’s components that relate to extensibility, programming model, optimization, and query execution are dive into. Expand
A Demonstration of the BigDAWG Polystore System
TLDR
BigDAWG is presented, a reference implementation of a new architecture for "Big Data" applications that showcases novel approaches for querying across multiple storage engines, data visualization, and scalable real-time analytics. Expand
Opening the Black Boxes in Data Flow Optimization
TLDR
This work design and implement an optimizer for parallel data flows that does not assume knowledge of semantics or algebraic properties of operators, and can optimize the operator order of nonrelational data flows, a unique feature among today's systems. Expand
...
1
2
3
4
5
...