Learn More
Graphics Processing Units (GPUs) have evolved into highly parallel and fully programmable architecture over the past five years, and the advent of CUDA has facilitated their application to many real-world applications. In this paper, we deal with a GPU implementation of Ant Colony Optimization (ACO), a population-based optimization method which comprises(More)
Microscopic imaging is an important tool for characterizing tissue morphology and pathology. 3D reconstruction and visualization of large sample tissue structure requires registration of large sets of high-resolution images. However, the scale of this problem presents a challenge for automatic registration methods. In this paper we present a novel method(More)
Vienna Fortran, High Performance Fortran (HPF) and other data parallel languages have been introduced to allow the programming of massively parallel distributed-memory machines (DMMP) at a relatively high level of abstraction based on the SPMD paradigm. Their main features include directives to express the distribution of data and computations across the(More)
A significant part of scientific codes consist of sparse matrix computations. In this work we propose two new pseudoregular data distributions for sparse matrices. The Multiple Recursive Decomposition (MRD) partitions the data using the prime factors of the dimensions of a multiprocessor network with mesh topology. Furthermore, we introduce a new storage(More)
Ant Colony Optimisation (ACO) is an effective population-based meta-heuristic for the solution of a wide variety of problems. As a population-based algorithm, its computation is intrinsically massively parallel, and it is therefore theoretically well-suited for implementation on Graphics Processing Units (GPUs). The ACO algorithm comprises two main stages:(More)
We present a novel use of GPUs (Graphics Processing Units) for the analysis of histopathological images of neuroblastoma, a childhood cancer. Thanks to the advent of modern mi-croscopy scanners, whole-slide histopathological images can now be acquired but the computational costs to analyze these images using sophisticated image analysis algorithms are(More)
The last six years has seen Moore's Law continue to produce incredible gains in computational power. Indeed, the November, 2007 list of the top ten fastest supercomputers in the world contained no machines with acceleration of any kind. The same list six years later has four of the ten fastest supercomputers using accelerators, including the top two(More)
High-level data-parallel languages such as Vienna Fortran and High Performance Fortran (HPF) have been introduced to allow the programming of massively parallel distributed-memory machines at a relatively high level of abstraction, based on the Single-Program-Multiple-Data (SPMD) paradigm. Their main features include mechanisms for expressing the(More)
Sparse matrix problems are diicult to parallelize eeciently on distributed memory machines since data is often accessed indirectly. Inspector/executor strategies, which are typically used to parallelize loops with indirect references, incur substantial run-time preprocessing overheads when references with multiple levels of indirection are encountered | a(More)