• Corpus ID: 220754171

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

@inproceedings{Zaharia2012ResilientDD,
  title={Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing},
  author={Matei A. Zaharia and Mosharaf Chowdhury and Tathagata Das and Ankur Dave and Justin Ma and Murphy McCauly and Michael J. Franklin and Scott Shenker and Ion Stoica},
  booktitle={NSDI},
  year={2012}
}
We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases, keeping data in memory can improve performance by an order of magnitude. To achieve fault tolerance efficiently, RDDs provide a restricted… 
Reliable, Memory Speed Storage for Cluster Computing Frameworks
TLDR
Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks by introducing a checkpointing algorithm that guarantees bounded recovery cost and resource allocation strategies for recomputation under common resource schedulers.
An I/O-efficient and adaptive fault-tolerant framework for distributed graph computations
TLDR
This work proposes to dynamically adjust checkpoint intervals based on a carefully designed cost-analysis model, by taking the underlying computing workload into account, to reduce archiving overhead while simultaneously guaranteeing the failure recovery efficiency in large-scale iterative graph computation systems.
Cost-based Fault-tolerance for Parallel Data Processing
TLDR
The experiments show that the cost-based fault-tolerance scheme outperforms all existing strategies and always selects the sweet spot for short- and long running queries as well as for different cluster setups.
In-Memory Indexed Caching for Distributed Data Processing
TLDR
The Indexed DataFrame is introduced, an in-memory cache that supports a dataframe abstraction which incorporates indexing capabilities to support fast lookup and join operations and supports appends with multi-version concurrency control.
Efficient and Programmable Machine Learning on Distributed Shared Memory via Static Analysis
TLDR
This paper presents a system called Orion, which statically parallelizes serial for-loop nests, which read and write distributed shared memory and schedules computation on a distributed cluster, preserving fine-grained data dependency, and shows that a machine learning training program parallelized by Orion may get a 3.5× speedup compared to a data-parallel implementation based on parameter servers, while enjoying a much more usable programming model.
Bigflow: A General Optimization Layer for Distributed Computing Frameworks
As data volumes grow rapidly, distributed computations are widely employed in data-centers to provide cheap and efficient methods to process large-scale parallel datasets. Various computation models
Stark: Optimizing In-Memory Computing for Dynamic Dataset Collections
TLDR
Stark is a system specifically designed for optimizing in-memory computing on dynamic dataset collections that delivers elasticity into partitions to balance task execution time and reduce job makespan and achieves bounded failure recovery latency byoptimizing the data checkpointing strategy.
Towards Whatever-Scale Abstractions for Data-Driven Parallelism
TLDR
This paper describes ongoing work towards extending previous abstractions to support data-driven parallelism for Whatever-Scale Computing, and plans to target rack-scale distributed systems.
Discretized streams: fault-tolerant streaming computation at scale
TLDR
D-Streams enable a parallel recovery mechanism that improves efficiency over traditional replication and backup schemes, and tolerates stragglers, and can easily be composed with batch and interactive query models like MapReduce, enabling rich applications that combine these modes.
DtCraft: A distributed execution engine for compute-intensive applications
TLDR
DtCraft is introduced, a modern C+,+,17-based distributed execution engine that efficiently supports a new powerful programming model for building high-performance parallel applications and users need no understanding of distributed computing.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
Piccolo: Building Fast, Distributed Programs with Partitioned Tables
TLDR
Experiments show Piccolo to be faster than existing data flow models for many problems, while providing similar fault-tolerance guarantees and a convenient programming interface.
A recoverable distributed shared memory integrating coherence and recoverability
TLDR
This paper proposes a checkpointing mechanism relying on a recoverable distributed shared memory (DSM) in order to tolerate single node failures, and uses standard memories to store both current and recovery data.
Incoop: MapReduce for incremental computations
TLDR
This paper describes the architecture, implementation, and evaluation of Incoop, a generic MapReduce framework for incremental computations that detects changes to the input and automatically updates the output by employing an efficient, fine-grained result reuse mechanism.
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
TLDR
The results show that Mesos can achieve near-optimal data locality when sharing the cluster among diverse frameworks, can scale to 50,000 (emulated) nodes, and is resilient to failures.
HaLoop: Efficient Iterative Data Processing on Large Clusters
TLDR
HaLoop is presented, a modified version of the Hadoop MapReduce framework that is designed to serve iterative applications and dramatically improves their efficiency by making the task scheduler loop-aware and by adding various caching mechanisms.
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
TLDR
It is shown that excellent absolute performance can be attained--a general-purpose sort of 1012 Bytes of data executes in 319 seconds on a 240-computer, 960- disk cluster--as well as demonstrating near-linear scaling of execution time on representative applications as the authors vary the number of computers used for a job.
Stateful bulk processing for incremental analytics
TLDR
A generalized architecture for continuous bulk processing (CBP) is presented that raises the level of abstraction for building incremental applications and shows how one can use a small set of flexible dataflow primitives to perform web analytics and mine large-scale, evolving graphs in an incremental fashion.
Dryad: distributed data-parallel programs from sequential building blocks
TLDR
The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
FlumeJava: easy, efficient data-parallel pipelines
TLDR
The combination of high-level abstractions for parallel data and computation, deferred evaluation and optimization, and efficient parallel primitives yields an easy-to-use system that approaches the efficiency of hand-optimized pipelines.
CIEL: A Universal Execution Engine for Distributed Data-Flow Computing
TLDR
The execution engine provides transparent fault tolerance and distribution to Skywriting scripts and high-performance code written in other programming languages, and achieves scalable performance for both iterative and non-iterative algorithms.
...
1
2
3
4
5
...