Corpus ID: 10995300

The Myria Big Data Management and Analytics System and Cloud Services

@inproceedings{Wang2017TheMB,
  title={The Myria Big Data Management and Analytics System and Cloud Services},
  author={Jingjing Wang and T. Baker and Magdalena Balazinska and Daniel Halperin and Brandon Haynes and Bill Howe and Dylan Hutchison and Shrainik Jain and Ryan Maas and Parmita Mehta and Dominik Moritz and Brandon Myers and Jennifer Ortiz and Dan Suciu and Andrew Whitaker and Shengliang Xu},
  booktitle={CIDR},
  year={2017}
}
In this paper, we present an overview of the Myria stack for big data management and analytics that we developed in the database group at the University of Washington and that we have been operating as a cloud service aimed at domain scientists around the UW campus. We highlight Myria’s key design choices and innovations and report on our experience with using Myria for various data science use-cases. 
No data left behind: real-time insights from a complex data ecosystem
TLDR
System-PV is a real-time analytics system that masks the complexity of dealing with multiple data sources while offering minimal response times, and extends Spark with a sophisticated data virtualization module that supports multiple applications - from SQL queries to machine learning. Expand
BUDaMaF - Data Management in Cloud Federations
TLDR
The BUDaMaF tries to create an automated uniform way of managing all the data transactions, as well as the data stores themselves, in a polyglot multi-cloud, consisting of a plethora of different machines and data store systems. Expand
Elastic Memory Management for Cloud Data Analytics
TLDR
This work develops an approach for the automatic and elastic management of memory in shared clusters executing data analytics applications that outperforms static memory allocation leading to fewer query failures when memory is scarce, up to 80% lower garbage collection overheads, and up to 30% lower query times whenMemory is abundant. Expand
Just-in-time Analytics Over Heterogeneous Data and Hardware
TLDR
This thesis redesigns the data management stack to natively cater for data heterogeneity and exploit emerging hardware heterogeneity by customizing the system implementation based on the available heterogeneous processors – CPUs and GPGPUs. Expand
Extending Apache Spark with a Mediation Layer
TLDR
Spark Mediator is presented, a system that extends the logical data integration capabilities of Apache Spark to the integration of schizophrenia neuroimaging data and is compared with previous data integration systems. Expand
SLAOrchestrator: Reducing the Cost of Performance SLAs for Cloud Data Analytics
TLDR
SLAOrchestrator is a new system designed to reduce the price increases necessary to support performance SLAs in cloud analytics systems by utilizing an efficient combination of elastic query scheduling and multi-tenant resource provisioning algorithms to reduced the costs of performance guarantees. Expand
Handling Evolution in Big Data Architectures
TLDR
This paper analyzes architectures designed for Big Data processing and analysis described in the literature with the purpose to identify the most appropriate solution for the evolution problem and proposes an architecture that allows to perform different kinds of analytical tasks on Big Data retrieved from multiple heterogeneous data sources with different latency. Expand
Magpie: Python at Speed and Scale using Cloud Backends
TLDR
A system is described, coined Magpie, which exposes the popular Pandas API while lazily pushing large chunks of computation into scalable, efficient, and secured database engines, bringing together the ease of use and versatility of Python environments with the enterprise-grade, high-performance query processing of cloud database systems. Expand
Scalable unified data analytics
TLDR
This thesis evaluates data analytics systems that support the data science work-flow by introducing a data science benchmark, Sanzu, and believes that data analysts and scientists would want to use a single system that can perform both data analysis tasks and SQL querying, without requiring data movement between different systems. Expand
Runtime Optimizations for Large-Scale Data Analytics
TLDR
This dissertation presents methods to improve system efficiency for large-scale data analytics, and demonstrates that runtime optimzation can significantly improve overall system performance: it can lower query execution times, improve resource utilization, and reduce application failures. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 66 REFERENCES
Demonstration of the Myria big data management service
TLDR
This interactive demonstration will guide visitors through an exploration of several key Myria features by interfacing with the live system to analyze big datasets over the web. Expand
Toward elastic memory management for cloud data analytics
TLDR
This work presents several key elements towards elastic memory management in modern big data systems to avoid out-of-memory failures without over-provisioning but also to avoid garbage-collection overheads when possible. Expand
Big-Data Management Use-Case: A Cloud Service for Creating and Analyzing Galactic Merger Trees
TLDR
A service that enables astronomers to study the growth history of galaxies by following their `merger trees' in large-scale astrophysical simulations by using the Myria parallel data management system as back-end and the D3 data visualization library within its graphical front-end. Expand
The Fourth Paradigm: Data-Intensive Scientific Discovery
This presentation will set out the eScience agenda by explaining the current scientific data deluge and the case for a “Fourth Paradigm” for scientific exploration. Examples of data intensive scienceExpand
Changing the Face of Database Cloud Services with Personalized Service Level Agreements
TLDR
An approach for generating Personalized Service Level Agreements (PSLAs) that separate cloud users from the details of compute resources behind a cloud database management service is developed and evaluated. Expand
The BigDAWG Polystore System
TLDR
This paper presents a new view of federated databases to address the growing need for managing information that spans multiple data models, and proposes a polystore architecture designed to unify querying overmultiple data models. Expand
Comparative evaluation of big-data systems on scientific image analytics workloads
TLDR
This research presents a meta- database management system that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and cataloging large volumes of image data. Expand
The Hadoop Distributed File System
TLDR
The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on. Expand
The Snowflake Elastic Data Warehouse
TLDR
The paper highlights some of the key features of Snowflake: extreme elasticity and availability, semi-structured and schema-less data, time travel, and end-to-end security. Expand
Impala: A Modern, Open-Source SQL Engine for Hadoop
TLDR
This paper presents Impala from a user’s perspective, gives an overview of its architecture and main components and briefly demonstrates its superior performance compared against other popular SQL-on-Hadoop systems. Expand
...
1
2
3
4
5
...