Corpus ID: 2287828

An Evaluation of the ORNL Cray XT 3

  title={An Evaluation of the ORNL Cray XT 3},
  author={Sadaf R. Alam and Richard F. Barrett and Mark R. Fahey Jeffery A. Kuehn and O. E. Bronson Messer and Richard T. Mills Philip C. Roth and Jeffrey S. Vetter and Patrick H. Worley},
In 2005, Oak Ridge National Laboratory received delivery of a 5,294 processor Cray XT3. The XT3 is Cray’s third-generation massively parallel processing system. The ORNL system uses a singleprocessor node built around the AMD Opteron and uses a custom chip—called SeaStar—for interprocessor communication. The system uses a lightweight operating system called Catamount on its compute nodes. This paper provides a performance evaluation of the Cray XT3, including measurements for microbenchmark… Expand
Cray XT4: an early evaluation for petascale scientific simulation
  • S. Alam, J. Kuehn, +4 authors P. Worley
  • Computer Science
  • Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)
  • 2007
An evaluation of the Cray XT4 is presented using micro-benchmarks to develop a controlled understanding of individual system components, providing the context for analyzing and comprehending the performance of several petascale-ready applications. Expand
Performance analysis and projections for Petascale applications on Cray XT series systems
Impact of key changes that occurred during the dual-core to quad-core processor upgrade on applications behavior are evaluated and projections for the next-generation massively-parallel platforms with multi-core processors are provided, specifically for proposed Petascale Cray XT5 system. Expand
Performance Evaluation of the Intel Sandy Bridge Based NASA Pleiades Using Scientific and Engineering Applications
We present a performance evaluation of Pleiades based on the Intel Xeon E5-2670 processor, a fourth-generation eight-core Sandy Bridge architecture, and compare it with the previous third generationExpand
Characterizing the I/O behavior of scientific applications on the Cray XT
  • P. Roth
  • Computer Science
  • PDSW '07
  • 2007
This paper presents the approach for characterizing the I/O demands of applications on the Cray XT, and presents preliminary case studies showing the use of the I-O characterization infrastructure with climate studies and combustion simulation programs. Expand
Characterizing Parallel Scaling of Scientific Applications using IPM
Scientific applications will have to scale to many thousands of processor cores to reach petascale. Therefore it is crucial to understand the factors that affect their scalability. Here we examineExpand
Impact of multicores on large-scale molecular dynamics simulations
This work investigates the impact of resource contention on three scalable molecular dynamics suites: AMBER (PMEMD module), LAMMPS, and NAMD, and reveals the factors that can inhibit scaling and performance efficiency on emerging multicore processors. Expand
Sensitivity Analysis of Biomolecular Simulations using Symbolic Models
A technique to model symbolically the communication patterns of production-level scientific applications to study workload growth rates and to carry out sensitivity analysis is developed and applied to the particle mesh ewald (PME) implementation in the sander package of the AMBER framework. Expand
On the Path to Enable Multi-scale Biomolecular Simulations on PetaFLOPS Supercomputer with Multi-core Processors
  • S. Alam, P. Agarwal
  • Computer Science
  • 2007 IEEE International Parallel and Distributed Processing Symposium
  • 2007
It is concluded that not only the biomolecular simulations need to be aware of the underlying multi-core hardware in order to achieve maximum performance but also the system software needs to provide processor and memory placement features in the high-end systems. Expand


Cray X1 Evaluation Status Report
Results of the micro-benchmarks and kernel benchmarks show the architecture of the Cray X1 to be exceptionally fast for most operations, and the best results are shown on large problems, where it is not possible to fit the entire problem into the cache of the processors. Expand
Early Evaluation of the Cray X1
This paper describes the initial evaluation of the X1 architecture, focusing on microbenchmarks, kernels, and application codes that highlight the performance characteristics of theX1 architecture and indicate how to use the system most efficiently. Expand
A TeraFLOP supercomputer in 1996: the ASCI TFLOP system
The hardware and software design of the ASCI TFLOP supercomputer is described, which accelerates the development of new scalable supercomputers resulting in a TeraFLOP computer before the end of 1996. Expand
Performance evaluation of the SGI Altix 3700
It is found that the Altix provides many advantages over other non-vector machines and it is competitive with the Cray XI on a number of kernels and applications, and its globally shared memory allows users convenient parallelization with OpenMP or pthreads. Expand
Architectural specification for massively parallel computers: an experience and measurement‐based approach
This paper describes the hardware and software architecture of the Red Storm system developed at Sandia National Laboratories, and presents a comparison of benchmarks and application performance that support the approach of leveraging high‐volume, mass‐market commodity processors. Expand
Massively parallel computing using commodity components
The design goals of the cluster and an approach to developing a commodity-based computational resource capable of delivering performance comparable to production-level MPP machines are presented. Expand
Synchronization and communication in the T3E multiprocessor
The T3E augments the memory interface of the DEC 21164 microprocessor with a large set of explicitly-managed, external registers (E-registers), which provide a rich set of atomic memory operations and a flexible, user-level messaging facility. Expand
Introduction to the HPC Challenge Benchmark Suite
The HPC Challenge benchmark suite is designed to augment the Top500 list, providing benchmarks that bound the performance of many real applications as a function of memory access characteristics e.g., spatial and temporal locality, and providing a framework for including additional tests. Expand
Practical performance portability in the Parallel Ocean Program (POP)
The design of the Parallel Ocean Program (POP) is described with an emphasis on portability, and analysis of POP performance across machines is used to characterize performance and identify improvements while maintaining portability. Expand
MPI: The Complete Reference
MPI: The Complete Reference is an annotated manual for the latest 1.1 version of the standard that illuminates the more advanced and subtle features of MPI and covers such advanced issues in parallel computing and programming as true portability, deadlock, high-performance message passing, and libraries for distributed and parallel computing. Expand