• Corpus ID: 10467552

Scalability! But at what COST?

@inproceedings{McSherry2015ScalabilityBA,
  title={Scalability! But at what COST?},
  author={Frank McSherry and Michael Isard and Derek Gordon Murray},
  booktitle={HotOS},
  year={2015}
}
We offer a new metric for big data platforms, COST, or the Configuration that Outperforms a Single Thread. The COST of a given platform for a given problem is the hardware configuration required before the platform outperforms a competent single-threaded implementation. COST weighs a system's scalability against the overheads introduced by the system, and indicates the actual performance gains of the system, without rewarding systems that bring substantial but parallelizable overheads. We… 

Figures and Tables from this paper

Supporting Fine-grained Dataflow Parallelism in Big Data Systems
TLDR
This paper analyzes the data processing cores of state-of-the-art big data systems to find the cause for scalability problems, and identifies design patterns in the code that are suitable for pipeline and task-level parallelism, potentially increasing application performance.
Performance Scaling of Cassandra on High-Thread Count Servers
TLDR
This paper describes the experiences studying the performance scaling characteristics of Cassandra, a popular open-source, column-oriented database, on a single high-thread count dual socket server, and shows how by taking into account specific knowledge of the underlying topology of the server architecture, it can achieve substantial improvements in performance scalability.
Cost-Aware Streaming Data Analysis: Distributed vs Single-Thread
TLDR
This work presents an empirical study that compares the cost of two performance equivalent solutions for a real streaming data analysis task for the Telecommunication industry and shows that the most cost-effective solution depends on the dataset size.
Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data
TLDR
Flare is presented, an accelerator module for Spark that delivers order of magnitude speedups on scale-up architectures for a large class of applications, Inspired by query compilation techniques from main-memory database systems, which incorporates a code generation strategy designed to match the unique aspects of Spark and the characteristics of scale- up architectures.
Towards implicit parallel programming for systems
TLDR
This thesis establishes the idea of stateful functional programming in the context of a server and argues that the associated compiler and dataflow-based runtime system can also solve problems that are directly connected to a parallel execution.
Vortex: Extreme-Performance Memory Abstractions for Data-Intensive Streaming Applications
TLDR
This work develops a set of algorithms called Vortex that force the application to generate access violations during processing of the stream, which are transparently handled in such a way that creates an illusion of an infinite buffer that fits into a regular C/C++ pointer.
SMR: Scalable MapReduce for Multicore Systems
  • Yu ZhangYu-Fen YuJiankang Chen
  • Computer Science
    2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • 2018
TLDR
A multithreaded model Sthread is proposed which provides isolated address spaces between threads to avoid contentions, and provides unbounded-channel abstraction for asynchronously passing unbounded data streams between threads.
A Case Against Tiny Tasks in Iterative Analytics
TLDR
An alternative approach is proposed that relies on an auto-parallelizing compiler tightly integrated with the MPI runtime, illustrating the opposite end of the spectrum where task granularities are as large as possible.
Flare: Native Compilation for Heterogeneous Workloads in Apache Spark
TLDR
Flare is presented: a new back-end for Spark that brings performance closer to the best SQL engines, without giving up the added expressiveness of Spark.
Distributed Machine Learning-but at what COST ?
TLDR
The results indicate that while being able to robustly scale with increasing data set size, current generation data flow systems are surprisingly inefficient at training machine learning models at need substantial resources to come within reach of the performance of single machine libraries.
...
...

References

SHOWING 1-10 OF 30 REFERENCES
Making Sense of Performance in Data Analytics Frameworks
TLDR
It is found that CPU (and not I/O) is often the bottleneck, and improving network performance can improve job completion time by a median of at most 2%, and the causes of most stragglers can be identified.
Naiad: a timely dataflow system
TLDR
It is shown that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining.
Spinning Fast Iterative Data Flows
TLDR
This work proposes a method to integrate incremental iterations, a form of workset iterations, with parallel dataflows and presents an extension to the programming model for incremental iterations that alleviates for the lack of mutable state in dataflow and allows for exploiting the sparse computational dependencies inherent in many iterative algorithms.
Java support for data-intensive systems: experiences building the telegraph dataflow system
TLDR
This paper highlights the pleasures of coding with Java, and some of the pains of coding around Java in order to obtain good performance in a data-intensive server, and presents concrete suggestions for evolving Java's interfaces to better suit serious software systems development.
A bridging model for parallel computation
TLDR
The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
A lightweight infrastructure for graph analytics
TLDR
This paper argues that existing DSLs can be implemented on top of a general-purpose infrastructure that supports very fine-grain tasks, implements autonomous, speculative execution of these tasks, and allows application-specific control of task scheduling policies.
TritonSort: A Balanced Large-Scale Sorting System
We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100 TB of input data spread across 832 disks in
The scalable commutativity rule: designing scalable software for multicore processors
TLDR
This paper introduces the following rule: Whenever interface operations commute, they can be implemented in a way that scales, which aids developers in building more scalable software starting from interface design and carrying on through implementation, testing, and evaluation.
DimmWitted: A Study of Main-Memory Statistical Analytics
TLDR
This first study of the tradeoff space of access methods and replication to support statistical analytics using first-order methods executed in the main memory of a Non-Uniform Memory Access (NUMA) machine discovers that there are tradeoffs between hardware and statistical efficiency.
Ligra: a lightweight graph processing framework for shared memory
TLDR
This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.
...
...