Interactive Launch of 16,000 Microsoft Windows Instances on a Supercomputer

@article{Jones2018InteractiveLO,
  title={Interactive Launch of 16,000 Microsoft Windows Instances on a Supercomputer},
  author={Michael Jones and Jeremy Kepner and Bradley Orchard and A. Reuther and William Arcand and David Bestor and Bill Bergeron and Chansup Byun and Vijay N. Gadepally and Michael Houle and Matthew Hubbell and Anna Klein and Lauren Milechin and Julie Mullen and Andrew Prout and Antonio Rosa and Siddharth Samsi and Charles Yee and Peter Michaleas},
  journal={2018 IEEE High Performance extreme Computing Conference (HPEC)},
  year={2018},
  pages={1-6}
}
Simulation, machine learning, and data analysis require a wide range of software which can be dependent upon specific operating systems, such as Microsoft Windows. Running this software interactively on massively parallel supercomputers can present many challenges. Traditional methods of scaling Microsoft Windows applications to run on thousands of processors have typically relied on heavyweight virtual machines that can be inefficient and slow to launch on modern manycore processors. This… 

Figures from this paper

Optimizing Xeon Phi for Interactive Data Analysis
TLDR
This paper describes matrix multiplication performance results for Matlab and GNU Octave over a variety of combinations of process counts and OpenMP threads and Xeon Phi memory modes and indicates that using KMP_AFFINITY=granlarity=fine, taskset pinning, and all2all cache memory mode allows both Mat lab and GNUOctave to achieve 66% of the practical peak performance.
DBOS: A DBMS-oriented Operating System
TLDR
It is shown herein that such a database OS can do scheduling, file management, and inter-process communication with competitive performance to existing systems and significantly better analytics can be provided as well as a dramatic reduction in code complexity.

References

SHOWING 1-10 OF 38 REFERENCES
Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis
TLDR
This work demonstrates launching 32,000 TensorFlow processes in 4 seconds and 262,000 Octave processes in 40 seconds, which allow researchers to rapidly explore novel machine learning architecture and data analysis algorithms.
HPC On Dec Alphas And Windows NT
TLDR
A dedicated computational cluster of eight DEC Alpha systems interconnected by 100 Hz switched Ethernet and running Digital Visual FORTRAN on Windows NT is obtained and is now able to run mainstream UK HPC codes such as ANGUS.
Achieving 100,000,000 database inserts per second using Accumulo and D4M
TLDR
The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications and has a peak performance of over 100,000,000 database inserts per second which is 100× larger than the highest previously published value for any other database.
LLSuperCloud: Sharing HPC systems for diverse rapid prototyping
TLDR
LLSuperCloud reverses the traditional paradigm of attempting to deploy supercomputing capabilities on a cloud and instead deploys cloud capability on a supercomputer, resulting in a system that can handle heterogeneous, massively parallel workloads while also providing high performance elastic computing, virtualization, and databases.
Scalable System Scheduling for HPC and Big Data
High performance computing with microsoft windows 2000
  • D. Lifka
  • Computer Science
    Proceedings 42nd IEEE Symposium on Foundations of Computer Science
  • 2001
TLDR
The experiences and issues CTC had moving from proprietary Unix-based systems to industry-standard Microsoft Windows 2000 systems are discussed.
LLMapReduce: Multi-level map-reduce for high performance data analysis
TLDR
LLMapReduce dramatically simplifies map- reduce programming by providing simple parallel programming capability in one line of code, and can overcome scaling limits in the map-reduce parallel programming model via options that allow the user to switch to the more efficient single-program-multiple-data (SPMD) parallel Programming model.
Scheduler technologies in support of high performance data analysis
TLDR
Job latency is critical for the efficient utilization of scalable computing infrastructures, and this paper presents the results of job launch benchmarking of several current schedulers: Slurm, Son of Grid Engine, Mesos, and Yarn, finding that all of these Schedulers have low utilization for short-running jobs.
Scalability of VM provisioning systems
TLDR
The startup performance overhead of three of the most mature, widely deployed cloud management frameworks is measured to determine their suitability for workloads typically seen in an HPC environment.
Enabling on-demand database computing with MIT SuperCloud database management system
TLDR
The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB, and permits snapshotting of databases to allow researchers to experiment and push the limits of the technology without concerns for data or productivity loss.
...
1
2
3
4
...