Learn More
The work presented here is driven by two observations. First, heterogeneous architectures that integrate a CPU and a GPU on the same chip are emerging, and hold much promise for supporting power-efficient and scalable high performance computing. Second, MapReduce has emerged as a suitable framework for simplified parallel application development for many(More)
A broad class of applications involve indirect or datadependent memory accesses and are referred to as irregular applications. Recent developments in SIMD architectures – specifically, the emergence of wider SIMD lanes, combination of SIMD parallelism with many-core MIMD parallelism, and more flexible programming APIs – are providing new(More)
Intel Xeon Phi (MIC architecture) is a relatively new accelerator chip, which combines large-scale shared memory parallelism with wide SIMD lanes. Mapping applications on a node with such an architecture to achieve high parallel efficiency's a major challenge. In this paper, we focus on developing system for heterogeneous graph processing, which is able to(More)
Applying SIMD parallelization to irregular applications with non-continuous and data-dependent memory accesses is challenging. While an application involving a static pattern of indirect accesses (across iterations) can be accelerated by data transformations, such techniques are no longer feasible if the indirect access patterns change over time. In this(More)
Intra-node architectures for high performance machines have been rapidly evolving over the recent years. We are seeing a diverse set of architectures, most of them with heterogeneous cores. This leads to two important questions for HPC programming: 1) how do we accelerate a single application using a heterogeneous collection of cores? 2) how do we develop(More)
Clusters with accelerators at each node have emerged as the dominant high-end architecture in recent years. Such systems can be extremely hard to program because of the underlying heterogeneity and the need for exploiting parallelism at multiple levels. Thus, easing parallel programming today requires not only high-level programming models, but ones from(More)
Online analytics based on runtime approximation has been widely adopted for meeting time and/or resource constraints. Though MapReduce has been gaining its popularity in both scientific and commercial sectors, there are several obstacles in implementing online analytics in a MapReduce implementation. In this paper, we present a MapReduce-like framework for(More)
  • 1