Ana Lucia Varbanescu

Learn More
—This paper presents a comprehensive performance comparison between CUDA and OpenCL. We have selected 16 benchmarks ranging from synthetic applications to real-world ones. We make an extensive analysis of the performance gaps taking into account programming models, optimization strategies, architectural details, and underlying compilers. Our results show(More)
The Cell Broadband Engine (BE) processor provides the potential to achieve an impressive level of performance for scientific applications. This level of performance can be reached by exploiting several dimensions of parallelism, such as thread-level parallelism using several Synergistic Processing Elements, data streaming parallelism, vector parallelism in(More)
Multi-core platforms have proven themselves able to accelerate numerous HPC applications. But programming data-intensive applications on such platforms is a hard, and not yet solved, problem. Not only do modern processors favor compute-intensive code, they also have diverse architectures and incompatible programming models. And even after making a difficult(More)
The performance potential of the Cell/B.E., as well as its availability, have attracted a lot of attention from various high-performance computing (HPC) fields. While computation intensive kernels proved to be exceptionally well suited for running on the Cell, irregular data-intensive applications are usually considered as poor matches. In this paper, we(More)
—Graph-processing platforms are increasingly used in a variety of domains. Although both industry and academia are developing and tuning graph-processing algorithms and platforms, the performance of graph-processing platforms has never been explored or compared in-depth. Thus, users face the daunting challenge of selecting an appropriate platform for their(More)
—With its design concept of cross-platform portability, OpenCL can be used not only on GPUs (for which it is quite popular), but also on CPUs. Whether porting GPU programs to CPUs, or simply writing new code for CPUs, using OpenCL brings up the performance issue, usually raised in one of two forms: " OpenCL is not performance portable! " or " Why using(More)
Although GPUs are considered ideal to accelerate massively data-parallel applications, there are still exceptions to this rule. For example, imbalanced applications cannot be efficiently processed by GPUs: despite the massive data parallelism, a varied computational workload per data point remains GPU-unfriendly. To efficiently process imbalanced(More)
Heterogeneous platforms integrating different processors like GPUs and multi-core CPUs become popular in high performance computing. While most applications are currently using the homogeneous parts of these platforms, we argue that there is a large class of applications that can benefit from their heterogeneity: massively parallel imbalanced applications.(More)