Learn More
Scientists, engineers, and statisticians must execute domain-specific application programs many times on large collections of file-based data. This activity requires complex orchestration and data management as data is passed to, from, and among application invocations. Distributed and parallel computing resources can accelerate such processing, but their(More)
In this paper we present the Coaster System. It is an automatically-deployed node provisioning (Pilot Job) system for grids, clouds, and ad-hoc desktop-computer networks supporting file staging, on-demand opportunistic multi-node allocation, remote logging, and remote monitoring. The Coaster System has been previously [32] shown to work at scales of(More)
Sharing data and storage space in a distributed system remains a difficult task for ordinary users, who are constrained to the fixed abstractions and resources provided by administrators. To remedy this situation, we introduce the concept of a tactical storage system (TSS) that separates storage abstractions from storage resources, leaving users free to(More)
—Many scientific applications are conceptually built up from independent component tasks as a parameter study, optimization, or other search. Large batches of these tasks may be executed on high-end computing systems; however, the coordination of the independent processes, their data, and their data dependencies is a significant scalability challenge. Many(More)
Scripting is often used in science to create applications via the composition of existing programs. Parallel scripting systems allow the creation of such applications, but each system introduces the need to adopt a somewhat specialized programming model. We present an alternative scripting approach, AMFS Shell, that lets programmers express parallel(More)
—High-performance computing (HPC) and distributed systems rely on a diverse collection of system software to provide application services, including file systems, schedulers, and web services. Such system software services must manage highly concurrent requests, interact with a wide range of resources, and scale well in order to be successful.(More)
Over the past few years, the increasing amounts of data produced by large-scale simulations have motivated a shift from traditional offline data analysis to in situ analysis and visualization. In situ processing began as the coupling of a parallel simulation with an analysis or visualization library, motivated primarily by avoiding the high cost of(More)
Tcl is the original embeddable dynamic language. Introduced in 1990, Tcl has been the foundation of the scripting interface of the popular biomolecular visualization and analysis program VMD since 1995 and was extended to the parallel molecular dynamics program NAMD in 1999. The two programs together have over 200,000 users who have enjoyed for nearly two(More)
We seek to enable efficient large-scale parallel execution of applications in which a shared filesystem abstraction is used to couple many tasks. Such parallel scripting (<i>many-task computing, MTC</i>) applications suffer poor performance and utilization on large parallel computers because of the volume of filesystem I/O and a lack of appropriate(More)
Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely-coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. "Many-task" programming models such as functional(More)