3D Real-Time Supercomputer Monitoring

  title={3D Real-Time Supercomputer Monitoring},
  author={Bill Bergeron and Matthew Hubbell and Dylan Sequeira and Winter Williams and William Arcand and David Bestor and Chansup Byun and Vijay N. Gadepally and Michael Houle and Michael Jones and Anna Klein and Peter Michaleas and Lauren Milechin and Julie Mullen and Andrew Prout and A. Reuther and Antonio Rosa and Siddharth Samsi and Charles Yee and Jeremy Kepner},
  journal={2021 IEEE High Performance Extreme Computing Conference (HPEC)},
Supercomputers are complex systems producing vast quantities of performance data from multiple sources and of varying types. Performance data from each of the thousands of nodes in a supercomputer tracks multiple forms of storage, memory, networks, processors, and accelerators. Optimization of application performance is critical for cost effective usage of a supercomputer and requires efficient methods for effectively viewing performance data. The combination of supercomputing analytics and 3D… 

Figures and Tables from this paper


Optimizing the Visualization Pipeline of a 3-D Monitoring and Management System
This paper will show how Accumulo, d4m, and Unity are used to generate a 3D visualization platform to monitor and manage the Lincoln Laboratory Supercomputer systems and how the approach has had to retool to scale with the systems.
D4M: Bringing associative arrays to database engines
The process of building the D4M-SciDB connector is described and the present performance of this connection is described in order to showcase how new databases may be supported by D 4M.
Driving big data with big compute
The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds, including LLGrid MapReduce, which allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster.
HPC-VMs: Virtual machines in high performance computing systems
This paper analyzes the effectiveness of using virtual machines in a high performance computing (HPC) environment, and proposes adding some virtual machine capability to already robust HPC environments for specific scenarios where the productivity gained outweighs the performance lost for usingvirtual machines.
Achieving 100,000,000 database inserts per second using Accumulo and D4M
The Apache Accumulo database is an open source relaxed consistency database that is widely used for government applications and has a peak performance of over 100,000,000 database inserts per second which is 100× larger than the highest previously published value for any other database.
Big Data strategies for Data Center Infrastructure management using a 3D gaming platform
This paper will demonstrate a system where Big Data strategies and 3D gaming technology is leveraged to successfully monitor and analyze multiple HPC systems and a lights-out modular HP EcoPOD 240a Data Center on a singular platform.
Enabling on-demand database computing with MIT SuperCloud database management system
The MIT SuperCloud database management system allows for rapid creation and flexible execution of a variety of the latest scientific databases, including Apache Accumulo and SciDB, and permits snapshotting of databases to allow researchers to experiment and push the limits of the technology without concerns for data or productivity loss.
Scalability of VM provisioning systems
The startup performance overhead of three of the most mature, widely deployed cloud management frameworks is measured to determine their suitability for workloads typically seen in an HPC environment.
Dynamic distributed dimensional data model (D4M) database and computation system
  • J. Kepner, W. Arcand, +13 authors Charles Yee
  • Computer Science
    2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
D4M (Dynamic Distributed Dimensional Data Model) has been developed to provide a mathematically rich interface to tuple stores (and structured query language “SQL” databases) and it is possible to create composable analytics with significantly less effort than using traditional approaches.
Large scale network situational awareness via 3D gaming technology
  • M. Hubbell, J. Kepner
  • Engineering, Computer Science
    2012 IEEE Conference on High Performance Extreme Computing
  • 2012
This paper has developed a 3D environment of the physical plant in the format of a networked multi player First Person Shooter to demonstrate a virtual depiction of the current state of the network and the machines operating on the network.