Learn More
Graphics Processing Units (GPUs) have evolved into highly parallel and fully programmable architecture over the past five years, and the advent of CUDA has facilitated their application to many real-world applications. In this paper, we deal with a GPU implementation of Ant Colony Optimization (ACO), a population-based optimization method which comprises(More)
Microscopic imaging is an important tool for characterizing tissue morphology and pathology. Large sets of microscopic images are usually required for 3D reconstruction and visualization of tissue structure. Registration is essential for the 3D reconstruction from the stack of images. However, the large size of image datasets proves to be a challenge for(More)
—Ant Colony Optimisation (ACO) is an effective population-based meta-heuristic for the solution of a wide variety of problems. As a population-based algorithm, its computation is intrinsically massively parallel, and it is therefore theoretically well-suited for implementation on Graphics Processing Units (GPUs). The ACO algorithm comprises two main stages:(More)
We are currently witnessing the emergence of two paradigms in parallel computing: streaming processing and multi-core CPUs. Represented by solid commercial products widely available in commodity PCs, GPUs and multi-core CPUs bring together an unprecedented combination of high performance at low cost. The scientific computing community needs to keep pace(More)
Vienna Fortran, High Performance Fortran (HPF) and other data parallel languages have been introduced to allow the programming of massively parallel distributed-memory machines (DMMP) at a relatively high level of abstraction based on the SPMD paradigm. Their main features include directives to express the distribution of data and computations across the(More)
High-level data-parallel languages such as Vienna Fortran and High Performance Fortran (HPF) have been introduced to allow the programming of massively parallel distributed-memory machines at a relatively high level of abstraction, based on the Single-Program-Multiple-Data (SPMD) paradigm. Their main features include mechanisms for expressing the(More)
Sparse matrix problems are diicult to parallelize eeciently on message-passing machines, since they access data through multiple levels of indirection. Inspectorrexecutor strategies, which are typically used to parallelize such problems impose signiicant preprocessing overheads. This paper describes the runtime support required by new compilation techniques(More)
This paper describes new compiler and run-time techniques to handle array accesses involving several levels of indirec-tion such as those arising in sparse and irregular problems. The lack of information at compile-time in such problems has typically required the insertion of expensive runtime support. We propose new data distributions which can be used(More)