The TianHe-1A Supercomputer: Its Hardware and Software

@article{Yang2011TheTS,
  title={The TianHe-1A Supercomputer: Its Hardware and Software},
  author={Xuejun Yang and Xiangke Liao and Kai Lu and Qingfeng Hu and Jun-qiang Song and Jinshu Su},
  journal={Journal of Computer Science and Technology},
  year={2011},
  volume={26},
  pages={344-351}
}
This paper presents an overview of TianHe-1A (TH-1A) supercomputer, which is built by National University of Defense Technology of China (NUDT). TH-1A adopts a hybrid architecture by integrating CPUs and GPUs, and its interconnect network is a proprietary high-speed communication network. The theoretical peak performance of TH-1A is 4700 TFlops, and its LINPACK test result is 2566 TFlops. It was ranked the No. 1 on the TOP500 List released in November, 2010. TH-1A is now deployed in National… Expand
MilkyWay-2 supercomputer: system and application
TLDR
The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity-off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications and intelligent system administration. Expand
Brief introduction of TianHe exascale prototype system
TLDR
The prototype system has a theoretical peak performance of 3.15 Pflops and is deployed at the National Supercomputer Center in Tianjin and has a total of 512 compute nodes where each node has three proprietary CPUs called Matrix-2000+. Expand
The Sunway TaihuLight supercomputer: system and applications
TLDR
Preliminary efforts on developing and optimizing applications on the TaihuLight system are reported, focusing on key application domains, such as earth system modeling, ocean surface wave modeling, atomistic simulation, and phase-field simulation. Expand
HLognGP: A parallel computation model for GPU clusters
TLDR
A parallel computation model called HLognGP is proposed to abstract the computation and communication features of heterogeneous platforms like TH‐1A and shows that HLog3GP outperforms the other two evaluated models and can well model the new particularities of GPU clusters. Expand
A Comprehensive Approach for a Power Efficient General Purpose Supercomputer
  • M. Bach, J. Cuveland, +8 authors D. Rohr
  • Computer Science
  • 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
  • 2013
TLDR
The LOEWE-CSC supercomputer addresses this problem by setting new standards in environmental compatibility as well as energy and cooling efficiency for high-performance and general-purpose computing, and provides a fundamental step towards cost-effective, environment-friendly exascale computing and IT operation. Expand
OpenMC: Towards Simplifying Programming for TianHe Supercomputers
TLDR
This work introduces a directive-based intra-node programming model, OpenMC, and shows that this new model can achieve ease of programming, high performance, and the degree of portability desired for heterogeneous nodes, especially those in TianHe supercomputers. Expand
Balancing CPU-GPU Collaborative High-Order CFD Simulations on the Tianhe-1A Supercomputer
  • Chuanfu Xu, Lilun Zhang, +6 authors W. Liu
  • Computer Science
  • 2014 IEEE 28th International Parallel and Distributed Processing Symposium
  • 2014
TLDR
This is the first paper that reports a CPUGPU collaborative high-order accurate aerodynamic simulation result with such a complex grid geometry, and Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 Tianhe-1A nodes. Expand
An analysis of computational workloads for the ORNL Jaguar system
TLDR
Analysis of science application workloads for the Jaguar Cray XT5 system during its tenure as a 2.3 petaflop supercomputer at Oak Ridge National Laboratory shows a foreshadowing of science workloads to be expected for future systems. Expand
Analyses on Performance of Gromacs in Hybrid MPI+OpenMP+CUDA Cluster
  • Ce Li, Wenbo Chen, Y. Zhang, Qifeng Bai
  • Computer Science
  • 2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)
  • 2014
TLDR
Two different sizes of protein simulation system are used as the data of GROMACS for the molecular simulation of different parallel granularity on mixed cluster which is based on Intel Xeon5650 and NVIDIA C2050 to obtain the best mechanism of hybird CPU-GPU cluster and analyze the advantage of MPI+OpenMP+CUDA hybrid parallel programming pattern. Expand
Customizing the HPL for China accelerator
TLDR
This work proposes the orchestrating algorithm for matrix multiplication (OAMM) to enhance the efficiency of the heterogeneous system composed of CPU and China accelerator and validates DPEM, OPTVEC and OAMM. Expand
...
1
2
3
4
5
...

References

SHOWING 1-6 OF 6 REFERENCES
A 64-bit stream processor architecture for scientific applications
TLDR
The design and implementation of a 64-bit stream processor, FT64 (Fei Teng 64), for scientific computing and a novel stream programming language, SF95 (Stream FORTRAN95), and its compiler,SF95Compiler, are developed to facilitate the development of scientific applications. Expand
Bounding energy consumption in large-scale MPI programs
TLDR
A system that determines a bound on the energy savings for an application is developed that applies to three scientific programs, two of which exhibit load imbalance---particle simulation and UMT2K. Expand
NVIDIA cuda software and gpu parallel computing architecture
TLDR
This talk will describe NVIDIA's massively multithreaded computing architecture and CUDA software for GPU computing, a scalable, highly parallel architecture that delivers high throughput for data-intensive processing. Expand
Energy-Efficient Cloud Computing
TLDR
The usage of methods and technologies currently used for energy-efficient operation of computer hardware and network infrastructure and some of the remaining key research challenges that arise when such energy-saving techniques are extended for use in cloud computing environments are identified. Expand
/software.intel.com/en-us/articles/intel-vtuneamplifier-xe
  • /software.intel.com/en-us/articles/intel-vtuneamplifier-xe
  • 2010
1] http://www.top500.org/lists
  • 1] http://www.top500.org/lists
  • 2010