• Publications
  • Influence
Extending OpenMP to Survive the Heterogeneous Multi-Core Era
TLDR
A number of extensions to the OpenMP language inspired in the StarSs programming model allow the programmer to write portable code easily for a number of different platforms, relieving him/her from developing the specific code to off-load tasks to the accelerators and the synchronization of tasks.
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
TLDR
This paper investigates if OpenMP could still survive in this new scenario and proposes a possible way to extend the current specification to reasonably integrate heterogeneity while preserving simplicity and portability.
CC-Radix: a cache conscious sorting based on Radix sort
TLDR
CC-Radix improves the data locality by dynamically partitioning the data set into subsets that fit in cache level L/sub 2/.
OmpSs@Zynq all-programmable SoC ecosystem
TLDR
This paper focuses on programmability and heterogeneous execution support, presenting a successful combination of the OmpSs programming model and the Zynq All-Programmable SoC platforms.
OpenMP extensions for FPGA accelerators
TLDR
Extensions to OpenMP 3.0 that try to address this second challenge and an implementation in a prototype runtime system are presented and a hybrid host/device operational mode to hide some of these overheads are proposed, significantly improving the performance of the applications.
FPGA-Based Prototype of the Task Superscalar Architecture
TLDR
The first hardware implementation of a prototype of the Task Superscalar architecture is presented; an experimental task-based data scheduler that dynamically detects inter-task data dependencies, identi es task-level parallelism, and executes tasks out-of-order.
Hybrid Dataflow/von-Neumann Architectures
TLDR
This paper classifies hybrid dataflow/von-Neumann models according to two different taxonomies: one based on the execution model used for inter- and intrablock execution, and the otherbased on the integration level of both control and dataflow execution models.
Cell-Dock: high-performance protein-protein docking
TLDR
Cell-Dock is presented, an FFT-based docking algorithm adapted to the Cell BE processor that runs faster than FTDock with maximum speedups of above 200×, while achieving results of similar quality.
Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications
TLDR
The results on a real machine show that following some strict programming rules, individual SPE to SPE communication almost achieves the peak bandwidth when using the DMA controllers to transfer memory chunks of at least 1024 Bytes, which should be considered in streaming programming.
...
1
2
3
4
5
...