Learn More
We present Resilient Distributed Datasets (RDDs), a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner. RDDs are motivated by two types of applications that current computing frameworks handle inefficiently: iterative algorithms and interactive data mining tools. In both cases,(More)
In pursuit of graph processing performance, the systems community has largely abandoned general-purpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized(More)
From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph(More)
Debugging the massive parallel computations that run in today’s datacenters is hard, as they consist of thousands of tasks processing terabytes of data. It is especially hard in production settings, where performance overheads of more than a few percent are unacceptable. To address this challenge, we present Arthur, a new debugger that provides a rich set(More)
Graph data is prevalent in many domains, but it has usually required specialized engines to analyze. This design is onerous for users and precludes optimization across complete workflows. We present GraphFrames, an integrated system that lets users combine graph algorithms, pattern matching and relational queries, and optimizes work across them. GraphFrames(More)
Nearly 15 years ago, Hellerstein, Haas and Wang proposed online aggregation (OLA), a technique that allows users to (1) observe the progress of a query by showing iteratively refined approximate answers, and (2) stop the query execution once its result achieves the desired accuracy. In this demonstration, we present G-OLA, a novel mini-batch execution model(More)
As the emergence of cloud computing brings the potential for large-scale data analysis to a broader community, architectural patterns for data analysis on the cloud, especially those addressing iterative algorithms, are increasingly useful. MapReduce suffers performance limitations for this purpose as it is not inherently designed for iterative algorithms.(More)
Many systems run rich analytics on sensitive data in the cloud, but are prone to data breaches. Hardware enclaves promise data confidentiality and secure execution of arbitrary computation, yet still suffer from access pattern leakage. We propose Opaque, a distributed data analytics platform supporting a wide range of queries while providing strong security(More)
A total of 20 consecutive patients with knee stiffness post total knee arthroplasty (TKA) underwent arthroscopic lysis of adhesions and manipulation plus indwelling epidural were evaluated retrospectively. Epidural catheters were placed preoperatively for an intended 6 weeks of postoperative analgesia to facilitate intensive physical therapy. The mean loss(More)