Manuel Ujaldon

Learn More
Graphics Processing Units (GPUs) have evolved into highly parallel and fully programmable architecture over the past five years, and the advent of CUDA has facilitated their application to many real-world applications. In this paper, we deal with a GPU implementation of Ant Colony Optimization (ACO), a population-based optimization method which comprises(More)
Vienna Fortran, High Performance Fortran (HPF) and other data parallel languages have been introduced to allow the programming of massively parallel distributed-memory machines (DMMP) at a relatively high level of abstraction based on the SPMD paradigm. Their main features include directives to express the distribution of data and computations across the(More)
A significant part of scientific codes consist of sparse matrix computations. In this work we propose two new pseudoregular data distributions for sparse matrices. The Multiple Recursive Decomposition (MRD) partitions the data using the prime factors of the dimensions of a multiprocessor network with mesh topology. Furthermore, we introduce a new storage(More)
High-level data-parallel languages such as Vienna Fortran and High Performance Fortran (HPF) have been introduced to allow the programming of massively parallel distributed-memory machines at a relatively high level of abstraction, based on the Single-Program-Multiple-Data (SPMD) paradigm. Their main features include mechanisms for expressing the(More)
Sparse matrix problems are di cult to parallelize e ciently on message-passing machines, since they access data through multiple levels of indirection. Inspector/executor strategies, which are typically used to parallelize such problems impose signi cant preprocessing overheads. This paper describes the runtime support required by new compilation techniques(More)
Microscopic imaging is an important tool for characterizing tissue morphology and pathology. 3D reconstruction and visualization of large sample tissue structure requires registration of large sets of high-resolution images. However, the scale of this problem presents a challenge for automatic registration methods. In this paper we present a novel method(More)
GPUs have recently attracted our attention as accelerators on a wide variety of algorithms, including assorted examples within the image analysis field. Among them, wavelets are gaining popularity as solid tools for data mining and video compression, though this comes at the expense of a high computational cost. After proving the effectiveness of the GPU(More)
This paper describes new compiler and run-time techniques to handle array accesses involving several levels of indirection such as those arising in sparse and irregular problems. The lack of information at compile-time in such problems has typically required the insertion of expensive runtime support. We propose new data distributions which can be used with(More)