PaSh: light-touch data-parallel shell processing
@article{Vasilakis2021PaShLD, title={PaSh: light-touch data-parallel shell processing}, author={Nikos Vasilakis and Konstantinos Kallas and Konstantinos Mamouras and Achilleas Benetopoulos and Lazar Cvetkovich}, journal={Proceedings of the Sixteenth European Conference on Computer Systems}, year={2021} }
This paper presents PaSh, a system for parallelizing POSIX shell scripts. Given a script, PaSh converts it to a dataflow graph, performs a series of semantics-preserving program transformations that expose parallelism, and then converts the dataflow graph back into a script---one that adds POSIX constructs to explicitly guide parallelism coupled with PaSh-provided Unix-aware runtime primitives for addressing performance- and correctness-related issues. A lightweight annotation language allows…
11 Citations
The Once and Future Shell
- Computer Science
- 2021
Improving the UNIX shell holds much promise for development, ops, and data processing; several avenues of research building on recent advances are outlined.
An order-aware dataflow model for parallel Unix pipelines
- Computer ScienceProc. ACM Program. Lang.
- 2021
A dataflow model for modelling parallel Unix shell pipelines is presented, and the semantics of transformations that exploit data parallelism available in Unix shell computations are captured and proved to be correctness.
Automatic synthesis of parallel unix commands and pipelines with KumQuat
- Computer SciencePPoPP
- 2022
KumQuat automatically synthesizes the combine operators, with a domain-specific combiner language acting as a strong regularizer that promotes efficient inference of correct combiners and enables the effective parallelization of the authors' benchmark scripts.
Unix shell programming: the next 50 years
- Computer ScienceHotOS
- 2021
This paper aims to help manage the shell's essential shortcomings (dynamism, power, and abstruseness) and address its inessential ones.
Automatic Synthesis of Parallel and Distributed Unix Commands with KumQuat
- Computer ScienceArXiv
- 2020
We present KumQuat, a system for automatically synthesizing parallel and distributed versions of Unix shell commands. KumQuat follows a divide-and-conquer approach, decomposing commands into (i) a…
An Empirical Investigation of Command-Line Customization
- Computer ScienceEmpir. Softw. Eng.
- 2022
It is conjecture that identifying common customization practices can point to particular usability issues within command-line programs, and that a deeper understanding of these practices can support researchers and tool developers in designing better user experiences.
Files-as-Filesystems for POSIX Shell Data Processing
- Computer SciencePLOS@SOSP
- 2021
The POSIX shell is 'stringy', and its ecosystem primarily supports line-oriented formats. While such formats are popular and common, contemporary programming often involves semi-structured data, like…
Report on the "The Future of the Shell" Panel at HotOS 2021
- Environmental ScienceArXiv
- 2021
This document summarizes the challenges and possible research directions around the shell and its ecosystem, collected during and after the HotOS21 Panel on the future of the shell. The goal is to…
The future of the shell: Unix and beyond
- Computer ScienceHotOS
- 2021
This 90-minute panel brings together researchers and engineers from disparate communities to think about the Unix shell's strengths and weaknesses, challenges and opportunities around the shell, and the shell's future.
The serverless shell
- Computer ScienceMiddleware Industry
- 2021
The results show that sshell achieves comparable or better performance than a high-end server and can be faster and more cost-efficient than a cluster-based solution to mine large datasets.
References
SHOWING 1-10 OF 84 REFERENCES
Extending Unix Pipelines to DAGs
- Computer ScienceIEEE Transactions on Computers
- 2017
Dgsh was evaluated through a number of common data processing and domain-specific examples, and was found to offer an expressive way to specify processing topologies, while also generally increasing processing throughput.
RaftLib: a C++ template library for high performance stream parallel processing
- Computer SciencePMAM@PPoPP
- 2015
RftLib aims to fully exploit the stream processing paradigm, enabling a full spectrum of streaming graph optimizations while providing a platform for the exploration of integrability with legacy C/C++ code.
A stream compiler for communication-exposed architectures
- Computer ScienceASPLOS X
- 2002
This paper describes a fully functional compiler that parallelizes StreamIt applications for Raw, including several load-balancing transformations, and demonstrates that the StreamIt compiler can automatically map a high-level stream abstraction to Raw without losing performance.
Safe Data Parallelism for General Streaming
- Computer ScienceIEEE Transactions on Computers
- 2015
This article presents a compiler and runtime system that automatically extracts data parallelism for general stream processing, and shows linear scalability for parallel regions that are computation-bound, and nearlinear scalability when tuples are shuffled across parallel regions.
Naiad: a timely dataflow system
- Computer ScienceSOSP
- 2013
It is shown that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining.
Interprocedural dependence analysis and parallelization
- Computer ScienceSIGP
- 2004
A method is presented that combines a deep analysis of program dependences with a broad analysis of the interaction among procedures, and a unified approach that integrates subscript analysis with aliasing and interprocedural information is presented.
Acute: high-level programming language design for distributed computation
- Computer ScienceICFP '05
- 2005
An experimental language is described, Acute, which extends an ML core to support distributed development, deployment, and execution, allowing type-safe interaction between separately-built programs.
Optimistic parallelism requires abstractions
- Computer SciencePLDI '07
- 2007
It is shown that Delaunay mesh generation and agglomerative clustering can be parallelized in a straight-forward way using the Galois approach, and results suggest that Galois is a practical approach to exploiting data parallelism in irregular programs.
Brook for GPUs: stream computing on graphics hardware
- Computer ScienceACM Trans. Graph.
- 2004
This paper presents Brook for GPUs, a system for general-purpose computation on programmable graphics hardware that abstracts and virtualizes many aspects of graphics hardware, and presents an analysis of the effectiveness of the GPU as a compute engine compared to the CPU.
The implementation of the Cilk-5 multithreaded language
- Computer SciencePLDI 1998
- 1998
Cilk-5's novel "two-clone" compilation strategy and its Dijkstra-like mutual-exclusion protocol for implementing the ready deque in the work-stealing scheduler are presented.