Learn More
—Using a general polynomial approximation approach, we present an arithmetic library generator for the logarithmic number system (LNS). The generator produces optimized LNS arithmetic libraries that improve significantly over previous LNS designs on area and latency. We also provide area cost estimation and bit-accurate simulation tools that facilitate(More)
Developing highly scalable algorithms for global atmospheric modeling is becoming increasingly important as scientists inquire to understand behaviors of the global atmosphere at extreme scales. Nowadays, heterogeneous architecture based on both processors and accelerators is becoming an important solution for large-scale computing. However, large-scale(More)
This paper proposes optimizations of the methods and parameters used in both mathematical approximation and hardware design for logarithmic number system (LNS) arithmetic. First, we introduce a general polynomial approximation approach with an adaptive divide-in-halves segmentation method for evaluation of LNS arithmetic functions. Second, we develop a(More)
This paper presents a hybrid algorithm for the petascale global simulation of atmospheric dynamics on Tianhe-2, the world's current top-ranked supercomputer developed by China's National University of Defense Technology (NUDT). Tianhe-2 is equipped with both Intel Xeon CPUs and Intel Xeon Phi accelerators. A key idea of the hybrid algorithm is to enable(More)
Memory-related constraints (memory bandwidth, cache size) are nowadays the performance bottleneck of most computational applications. Especially in the scenario of multiple cores, the performance does not scale with the number of cores in many cases. In our work, we present our FPGA-based solution for the 3D Reverse Time Migration (RTM) algorithm. As the(More)
The Sunway TaihuLight supercomputer is the world’s first system with a peak performance greater than 100 PFlops. In this paper, we provide a detailed introduction to the TaihuLight system. In contrast with other existing heterogeneous supercomputers, which include both CPU processors and PCIe-connected many-core accelerators (NVIDIA GPU or Intel Xeon Phi),(More)
Understanding the inherent system characteristics is crucial to the design and optimization of cloud storage system, and few studies have systematically investigated its data characteristics and access patterns. This paper presents an analysis of file system snapshot and five-month access trace of a campus cloud storage system that has been deployed on(More)
One of the most essential and challenging components in a climate system model is the atmospheric model. To solve the multi-physical atmospheric equations, developers have to face extremely complex stencil kernels. In this paper, we propose a hybrid CPU-FPGA algorithm that applies single and multiple FPGAs to compute the upwind stencil for the global(More)
The forward modeling of wave propagation is a widely-used computational method in oil and gas exploration. Its iterative stencil loops also have broad applications in scientific computing. However, the time-consuming iterative stencil loops greatly limit the exploration efficiency. In this paper, we accelerate the forward modeling on a number of different(More)