• Corpus ID: 10467552

Scalability! But at what COST?

@inproceedings{McSherry2015ScalabilityBA,
  title={Scalability! But at what COST?},
  author={Frank McSherry and Michael Isard and Derek Gordon Murray},
  booktitle={USENIX Workshop on Hot Topics in Operating Systems},
  year={2015}
}
We offer a new metric for big data platforms, COST, or the Configuration that Outperforms a Single Thread. The COST of a given platform for a given problem is the hardware configuration required before the platform outperforms a competent single-threaded implementation. COST weighs a system's scalability against the overheads introduced by the system, and indicates the actual performance gains of the system, without rewarding systems that bring substantial but parallelizable overheads. We… 

Figures and Tables from this paper

Supporting Fine-grained Dataflow Parallelism in Big Data Systems

This paper analyzes the data processing cores of state-of-the-art big data systems to find the cause for scalability problems, and identifies design patterns in the code that are suitable for pipeline and task-level parallelism, potentially increasing application performance.

Performance Scaling of Cassandra on High-Thread Count Servers

This paper describes the experiences studying the performance scaling characteristics of Cassandra, a popular open-source, column-oriented database, on a single high-thread count dual socket server, and shows how by taking into account specific knowledge of the underlying topology of the server architecture, it can achieve substantial improvements in performance scalability.

Cost-Aware Streaming Data Analysis: Distributed vs Single-Thread

This work presents an empirical study that compares the cost of two performance equivalent solutions for a real streaming data analysis task for the Telecommunication industry and shows that the most cost-effective solution depends on the dataset size.

Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data

Flare is presented, an accelerator module for Spark that delivers order of magnitude speedups on scale-up architectures for a large class of applications, Inspired by query compilation techniques from main-memory database systems, which incorporates a code generation strategy designed to match the unique aspects of Spark and the characteristics of scale- up architectures.

Towards implicit parallel programming for systems

This thesis establishes the idea of stateful functional programming in the context of a server and argues that the associated compiler and dataflow-based runtime system can also solve problems that are directly connected to a parallel execution.

Vortex: Extreme-Performance Memory Abstractions for Data-Intensive Streaming Applications

This work develops a set of algorithms called Vortex that force the application to generate access violations during processing of the stream, which are transparently handled in such a way that creates an illusion of an infinite buffer that fits into a regular C/C++ pointer.

SMR: Scalable MapReduce for Multicore Systems

  • Yu ZhangYu-Fen YuJiankang Chen
  • Computer Science
    2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
  • 2018
A multithreaded model Sthread is proposed which provides isolated address spaces between threads to avoid contentions, and provides unbounded-channel abstraction for asynchronously passing unbounded data streams between threads.

A Case Against Tiny Tasks in Iterative Analytics

An alternative approach is proposed that relies on an auto-parallelizing compiler tightly integrated with the MPI runtime, illustrating the opposite end of the spectrum where task granularities are as large as possible.

Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks

It is demonstrated that explicitly separating the use of different resources simplifies reasoning about performance without sacrificing performance, and allows for new optimizations to improve performance.

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

Flare is presented: a new back-end for Spark that brings performance closer to the best SQL engines, without giving up the added expressiveness of Spark.
...

References

SHOWING 1-10 OF 28 REFERENCES

Making Sense of Performance in Data Analytics Frameworks

It is found that CPU (and not I/O) is often the bottleneck, and improving network performance can improve job completion time by a median of at most 2%, and the causes of most stragglers can be identified.

Naiad: a timely dataflow system

It is shown that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining.

Spinning Fast Iterative Data Flows

This work proposes a method to integrate incremental iterations, a form of workset iterations, with parallel dataflows and presents an extension to the programming model for incremental iterations that alleviates for the lack of mutable state in dataflow and allows for exploiting the sparse computational dependencies inherent in many iterative algorithms.

Java support for data-intensive systems: experiences building the telegraph dataflow system

This paper highlights the pleasures of coding with Java, and some of the pains of coding around Java in order to obtain good performance in a data-intensive server, and presents concrete suggestions for evolving Java's interfaces to better suit serious software systems development.

A lightweight infrastructure for graph analytics

This paper argues that existing DSLs can be implemented on top of a general-purpose infrastructure that supports very fine-grain tasks, implements autonomous, speculative execution of these tasks, and allows application-specific control of task scheduling policies.

TritonSort: A Balanced Large-Scale Sorting System

We present TritonSort, a highly efficient, scalable sorting system. It is designed to process large datasets, and has been evaluated against as much as 100 TB of input data spread across 832 disks in

The scalable commutativity rule: designing scalable software for multicore processors

This paper introduces the following rule: Whenever interface operations commute, they can be implemented in a way that scales, which aids developers in building more scalable software starting from interface design and carrying on through implementation, testing, and evaluation.

DimmWitted: A Study of Main-Memory Statistical Analytics

This first study of the tradeoff space of access methods and replication to support statistical analytics using first-order methods executed in the main memory of a Non-Uniform Memory Access (NUMA) machine discovers that there are tradeoffs between hardware and statistical efficiency.

Ligra: a lightweight graph processing framework for shared memory

This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.

X-Stream: edge-centric graph processing using streaming partitions

X-Stream is novel in using an edge-centric rather than a vertex-centric implementation of this model, and streaming completely unordered edge lists rather than performing random access, and competes favorably with existing systems for graph processing.