Learn More
Distributed graph processing systems increasingly require many compute nodes to cope with the requirements imposed by contemporary graph-based Big Data applications. However, increasing the number of compute nodes increases the chance of node failures. Therefore, provisioning an efficient failure recovery strategy is critical for distributed graph(More)
There is growing interest to replace traditional servers with low-power multicore systems such as ARM Cortex-A9. However, such systems are typically provisioned for mobile applications that have lower memory and I/O requirements than server application. Thus, the impact and extent of the imbalance between application and system resources in exploiting(More)
The continuous increase in volume, variety and velocity of Big Data exposes datacenter resource scaling to an energy utilization problem. Traditionally, datacenters employ x86-64 (big) server nodes with power usage of tens to hundreds of Watts. But lately, low-power (small) systems originally developed for mobile devices have seen significant improvements(More)
Memory contention is an important performance issue in current multicore architectures. In this paper, we focus on understanding how off-chip memory contention affects the performance of parallel applications. Using measurements conducted on state-of-the-art multicore systems, we observed that off-chip memory traffic is not always bursty, as it was(More)
Parallel programming has transcended from HPC into mainstream, enabled by a growing number of programming models, languages and methodologies, as well as the availability of multicore systems. However, performance analysis of parallel programs is still difficult, especially for large and complex programs, or applications developed using different(More)
This paper analyzes the performance of three systems for in-memory data management: Memcached, Redis and the Resilient Distributed Datasets (RDD) implemented by Spark. By performing a thorough performance analysis of both analytics operations and fine-grained object operations such as set/get, we show that neither system handles efficiently both types of(More)
Traditional datacenter systems advocate the use of high-performance hardware, resulting in increased power consumption and cooling costs. With increasing availability of systems having diverse performance-to-power ratios, we analyze the energy efficiency of mixing high-performance and low-power nodes in a cluster. Using a model-driven analysis, we predict(More)
Multicore systems are increasingly adopted across many application domains. Consequently, understanding their performance is becoming an important issue for a growing number of users. However, performance analysis of parallel programs on multicore systems is still challenging, especially for large programs or applications developed in multiple programming(More)
Traditionally, imperative programming uses a series of state-based operands to model control-flow and, as a result, suffers from the well-known von Neumann bottleneck. In contrast, dataflow programs are driven only by the availability of instruction operands. However, the lack of mainstream dataflow hardware hinders direct dataflow instruction execution. On(More)