Learn More
It is an important task to tune performance for sparse matrix vector multiplication (SpMV), but it is also a difficult task because of its irregularity. In this paper, we propose a cache blocking method to improve the performance of SpMV on the emerging GPU architecture. The sparse matrix is partitioned into many sub-blocks, which are stored in CSR format.(More)
Semantic gap is one of the most important problems in the virtualized computer systems. Solving this problem not only helps to develop security and virtual machine monitoring applications, but also benefits for VMM resource management and VMM-based service implementation. In this paper, we first review the general architecture of virtual computer systems,(More)
GPUs provide powerful computing ability especially for data parallel algorithms. However, the complexity of the GPU system makes the optimization of even a simple algorithm difficult. Different parallel algorithms or optimization methods on a GPU often lead to very different performances. The matrix-vector multiplication routine for general dense matrices(More)
Many-core GPUs provide high computing ability and substantial bandwidth; however, optimizing irregular applications like SpMV on GPUs becomes a difficult but meaningful task. In this paper, we propose a novel method to improve the performance of SpMV on GPUs. A new storage format called HYB-R is proposed to exploit GPU architecture more efficiently. The COO(More)
Fungi bioaccumulation is a novel and highly promising approach to remediate polluted soil. The present study revealed a high ability to tolerate Cd and Cr in the fungus Pleurotus ostreatus HAU-2. However, high concentrations of Cd and Cr can suppress fungal growth and result in a variation of hypha micromorphology. Batch experiments were performed to(More)
This paper introduces PartitionSim, a parallel simulator for future thousand-core processors. The purpose of PartitionSim is to improve the simulation performance of many-core architectures at the expense of little accuracy sacrifice. To achieve this goal, we propose a novel technique: timing partition. Timing partition is based on such an observation: in a(More)
The past years has seen the emergence and fast growing of various wearable devices, such as smart watches, wrist bands, rings, glasses, jewelry, garments, etc. Although wearable devices have a wide usage scope, healthcare is one of the most promising field which is being exploited by a number of big enterprises and research institutes. However, existing(More)
This paper addresses the workload partition strategies in simulating many-core architectures. The key observation behind this paper is: compared to multicore, manycore features with more non-uniform memory access and unpredictable network traffic; these features degrade simulation speed and accuracy of parallel discrete event simulators (PDES) in cases of(More)
  • 1