Learn More
A GPU cluster is a cluster equipped with GPU devices. Excellent acceleration is achievable for computation-intensive tasks (<i>e. g.</i> matrix multiplication and LINPACK) and bandwidth-intensive tasks with data locality (<i>e. g.</i> finite-difference simulation). Bandwidth-intensive tasks such as large-scale FFTs without data locality are harder to(More)
Previously we have shown that the transient receptor potential vanilloid 4 (TRPV4) channel regulates urinary bladder function, and that TRPV4 is expressed in both smooth muscle and urothelial cell types within the bladder wall.(1) Urothelial cells have also been suggested to express TRPV1 channels.(2) Therefore, we enzymatically isolated guinea-pig(More)
(UNU). It is based in Macau, and was founded in 1991. It started operations in July 1992. UNU/IIST is jointly funded by the Governor of Macau and the governments of the People's Republic of China and Portugal through a contribution to the UNU Endownment Fund. As well as providing two-thirds of the endownment fund, the Macau authorities also supply UNU/IIST(More)
This paper studies top-down program development techniques for Bulk-Synchronous Parallelism. In that context a specification formalism Logs, for 'the Logic of Global Synchrony', has been proposed for the specification and high-level development of BSP designs. This paper extends the use of Logs to provide support for the protection of local variables in BSP(More)
To provide fault tolerance to computer systems suffering from transient faults, checkpointing and rollback recovery is one of the widely-used techniques. Among others, two primary checkpointing schemes have been proposed: independent and coordinated schemes. However, most existing works address only the need of employing a single check-pointing and rollback(More)
—In this paper we discuss about our experiences in improving the performance of two key algorithms: the single-precision matrix-matrix multiplication subprogram (SGEMM of BLAS) and single-precision FFT using CUDA. The former is computation-intensive, while the latter is memory bandwidth or communication-intensive. A peak performance of 393 Gflops is(More)