Automatic synthesis of parallel unix commands and pipelines with KumQuat

  title={Automatic synthesis of parallel unix commands and pipelines with KumQuat},
  author={Jiasi Shen and Martin C. Rinard and Nikos Vasilakis},
  journal={Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming},
  • Jiasi Shen, M. Rinard, N. Vasilakis
  • Published 31 December 2020
  • Computer Science
  • Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
We present KumQuat, a system for automatically generating data-parallel implementations of Unix shell commands and pipelines. The generated parallel versions split input streams, execute multiple instantiations of the original pipeline commands to process the splits in parallel, then combine the resulting parallel outputs to produce the final output stream. KumQuat automatically synthesizes the combine operators, with a domain-specific combiner language acting as a strong regularizer that… 
Supply-Chain Vulnerability Elimination via Active Learning and Regeneration
Harp, an ALR system for string processing components, is presented and it is demonstrated that Harp can eliminate vulnerabilities associated with libraries targeted in several highly visible security incidents, specifically event-stream, left-pad, and string-compare.


Automatic Parallelization of Recursive Procedures
A new framework for automatically parallelizing recursive procedures that typically appear in divide-and-conquer algorithms, and novel techniques for speculative runtime parallelization, which are more efficient and powerful in this context than analogous techniques proposed previously for speculatively parallelizing loops.
Extending Unix Pipelines to DAGs
Dgsh was evaluated through a number of common data processing and domain-specific examples, and was found to offer an expressive way to specify processing topologies, while also generally increasing processing throughput.
Modular divide-and-conquer parallelization of nested loops
Experimental results demonstrate that the proposed methodology for automatic generation of divide-and-conquer parallel implementations of sequential nested loops can parallelize highly non-trivial loop nests efficiently.
Synthesis of divide and conquer parallelism for loops
T theoretical results for when the necessary modifications to sequential code are possible, theoretical guarantees for the algorithmic solutions presented here, and experimental evaluation of the approach's success in practice and the quality of the produced parallel programs are presented.
MapReduce program synthesis
This paper presents a new algorithm and tool for synthesizing programs composed of efficient data-parallel operations that can execute on cloud computing infrastructure and demonstrates the efficiency of the approach and the small number of examples it requires to synthesize correct, scalable programs.
Macho: Programming with Man Pages
Mozo, a system which combines a natural language parser, a database of code, and an automated debugger to write simple programs from natural language and examples of their correct execution, is described.
Synthesis of UNIX Programs Using Derivational Analogy
APU is described, a derivational analogy based system that synthesizes UNIX shell scripts from a high-level problem specification, and its retrieval heuristics that exploit this assumption to automatically retrieve a good analog for a target problem from a case library, as well as its replay algorithm that enables it to effectively reuse the solution of an analogous problem to derive a solution for a new problem.
Dryad: distributed data-parallel programs from sequential building blocks
The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
Automatic parallelization of divide and conquer algorithms
This paper presents the design and implementation of a compiler that is designed to parallelize divide and conquer algorithms whose subproblems access disjoint regions of dynamically allocated arrays and shows that the programs perform well and exhibit good speedup.
POSH: A Data-Aware Shell
POSH is presented, a framework that accelerates shell applications with I/O-heavy components, such as data analytics with command-line utilities, and is benchmarked on real shell pipelines such as image processing, network security analysis, log analysis, distributed system debugging, and git to find that it provides speedups ranging from 1.6Γ— to 15Γ— compared to NFS.