The TianHe-1A Supercomputer: Its Hardware and Software

  title={The TianHe-1A Supercomputer: Its Hardware and Software},
  author={Xuejun Yang and Xiangke Liao and Kai Lu and Qingfeng Hu and Junqiang Song and Jinshu Su},
  journal={Journal of Computer Science and Technology},
This paper presents an overview of TianHe-1A (TH-1A) supercomputer, which is built by National University of Defense Technology of China (NUDT). TH-1A adopts a hybrid architecture by integrating CPUs and GPUs, and its interconnect network is a proprietary high-speed communication network. The theoretical peak performance of TH-1A is 4700 TFlops, and its LINPACK test result is 2566 TFlops. It was ranked the No. 1 on the TOP500 List released in November, 2010. TH-1A is now deployed in National… 

MilkyWay-2 supercomputer: system and application

The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity-off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications and intelligent system administration.

Brief introduction of TianHe exascale prototype system

The prototype system has a theoretical peak performance of 3.15 Pflops and is deployed at the National Supercomputer Center in Tianjin and has a total of 512 compute nodes where each node has three proprietary CPUs called Matrix-2000+.

The Sunway TaihuLight supercomputer: system and applications

Preliminary efforts on developing and optimizing applications on the TaihuLight system are reported, focusing on key application domains, such as earth system modeling, ocean surface wave modeling, atomistic simulation, and phase-field simulation.

HLognGP: A parallel computation model for GPU clusters

A parallel computation model called HLognGP is proposed to abstract the computation and communication features of heterogeneous platforms like TH‐1A and shows that HLog3GP outperforms the other two evaluated models and can well model the new particularities of GPU clusters.

A Comprehensive Approach for a Power Efficient General Purpose Supercomputer

  • M. BachJ. Cuveland D. Rohr
  • Computer Science
    2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
  • 2013
The LOEWE-CSC supercomputer addresses this problem by setting new standards in environmental compatibility as well as energy and cooling efficiency for high-performance and general-purpose computing, and provides a fundamental step towards cost-effective, environment-friendly exascale computing and IT operation.

OpenMC: Towards Simplifying Programming for TianHe Supercomputers

This work introduces a directive-based intra-node programming model, OpenMC, and shows that this new model can achieve ease of programming, high performance, and the degree of portability desired for heterogeneous nodes, especially those in TianHe supercomputers.

Balancing CPU-GPU Collaborative High-Order CFD Simulations on the Tianhe-1A Supercomputer

This is the first paper that reports a CPUGPU collaborative high-order accurate aerodynamic simulation result with such a complex grid geometry, and Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 Tianhe-1A nodes.

An analysis of computational workloads for the ORNL Jaguar system

Analysis of science application workloads for the Jaguar Cray XT5 system during its tenure as a 2.3 petaflop supercomputer at Oak Ridge National Laboratory shows a foreshadowing of science workloads to be expected for future systems.

Analyses on Performance of Gromacs in Hybrid MPI+OpenMP+CUDA Cluster

  • Ce LiWenbo ChenYang ZhangQifeng Bai
  • Computer Science
    2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS)
  • 2014
Two different sizes of protein simulation system are used as the data of GROMACS for the molecular simulation of different parallel granularity on mixed cluster which is based on Intel Xeon5650 and NVIDIA C2050 to obtain the best mechanism of hybird CPU-GPU cluster and analyze the advantage of MPI+OpenMP+CUDA hybrid parallel programming pattern.

Customizing the HPL for China accelerator

This work proposes the orchestrating algorithm for matrix multiplication (OAMM) to enhance the efficiency of the heterogeneous system composed of CPU and China accelerator and validates DPEM, OPTVEC and OAMM.



A 64-bit stream processor architecture for scientific applications

The design and implementation of a 64-bit stream processor, FT64 (Fei Teng 64), for scientific computing and a novel stream programming language, SF95 (Stream FORTRAN95), and its compiler,SF95Compiler, are developed to facilitate the development of scientific applications.

Bounding energy consumption in large-scale MPI programs

A system that determines a bound on the energy savings for an application is developed that applies to three scientific programs, two of which exhibit load imbalance---particle simulation and UMT2K.

NVIDIA cuda software and gpu parallel computing architecture

This talk will describe NVIDIA's massively multithreaded computing architecture and CUDA software for GPU computing, a scalable, highly parallel architecture that delivers high throughput for data-intensive processing.

Energy-Efficient Cloud Computing

The usage of methods and technologies currently used for energy-efficient operation of computer hardware and network infrastructure and some of the remaining key research challenges that arise when such energy-saving techniques are extended for use in cloud computing environments are identified.


  • 1]
  • 2010


  • /
  • 2010