Map/Reduce style data-parallel computation is characterized by the extensive use of user-defined functions for data processing and relies on data-shuffling stages to prepare data partitions for parallel computation. Instead of treating user-defined functions as " black boxes " , we propose to analyze those functions to turn them into " gray boxes " that… (More)
Configuration problems are not only prevalent, but also severely impair the reliability of today's system software. One fundamental reason is the ever-increasing complexity of configuration, reflected by the large number of configuration parameters ("knobs"). With hundreds of knobs, configuring system software to ensure high reliability and performance… (More)
To minimize the amount of data-shuffling I/O that occurs between the pipeline stages of a distributed data-parallel program, its procedural code must be optimized with full awareness of the pipeline that it executes in. Unfortunately, neither pipeline optimizers nor traditional compilers examine both the pipeline and procedural code of a data-parallel… (More)
Data races are ubiquitous in multi-threaded applications, but they are by no means easy to detect. One of the most important reasons is the complexity of thread interleavings. A volume of research has been devoted to the interleaving-insensitive detection. However, all the previous work focuses on the uniform detection (unknown to the characteristics of… (More)
Transactional memory (TM) is a parallel programming concept which reduces challenges in parallel programming. Existing distributed transactional memory system consumes too much bandwidth and brings high latency. In this work, we present Transactional Memory System for Cluster (Clustm), a generalized and scalable distributed transactional memory system. Our… (More)
Expressing synchronization in task parallelism remains a significant challenge because of the complicated relationships between tasks. In this paper, we propose a novel parallel programming model, namely function flow, where synchronization is easier to express. We release the burden of synchronizing by the virtue of parallel functions and functional wait.… (More)
The largest difference between a distributed and a non-distributed system is that the former introduces network messages to the system. Network messages bring the scalability to a distributed system as well as complexity to it. Testing large-scale distributed systems is a great challenge, because some errors happen after a distributed sequence of events… (More)
Computer Systems and Networks, with the goal of making computer systems and networks more dependable and manageable.