Learn More
Scheduling tasks with dflerent weights in the imprecise computation model is rather dflcult. Each task in the imprecise computation model is logically decomposed into a mandatory subtask and an optional subtask. The mandatory subtask must be completely executed before the deadline to produce acceptable result; the optional subtask begins after the mandatory(More)
Preconditioned Conjugate Gradient (PCG) method has been demonstrated to be effective in solving large-scale linear systems for sparse and symmetric positive definite matrices. One critical problem in PCG is to design a good preconditioner, which can significantly reduce the runtime while keeping memory usage efficient. Universal preconditioners are simple(More)
—In this paper, we investigate efficient algorithms and implementations using GPU plus CPU to solve the rectangle intersection problem on a plane. The problem is to report all intersecting pairs of iso-oriented rectangles, whose parallelization on GPUs poses two major computational challenges: data partition and the massive output. The algorithm we(More)
As the computational power of Graphics Processing Unit (GPU) increases, data transmission becomes the major performance bottleneck. In this study, we investigate two techniques, data streaming and data compression, to reduce the communication cost on GPU. Data streaming enables overlap of communication and computation, whereas data compression reduces the(More)
—This paper introduces a prototype of Taiwan UniCloud, a community-driven hybrid cloud platform for academics in Taiwan. The goal is to leverage resources in multiple clouds among different organizations. Each self-managing cloud can join the UniCloud platform to share its resources and simultaneously benefit from other clouds with scale-out capabilities.(More)
—Determinant Quantum Monte Carlo (DQMC) simulation has been widely used to reveal macroscopic properties of strong correlated materials. However, parallelization of the DQMC simulation is extremely challenging duo to the serial nature of underlying Markov chain and numerical stability issues. We extend previous work with novelty by presenting a hybrid(More)
—Communication-aware task mapping algorithms , which map parallel tasks onto processing nodes according to the communication patterns of applications , are essential to reduce the communication time in modern high performance computing. In this paper , we design algorithms specifically for interconnected multicore systems, whose architectural property,(More)