Learn More
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multiprocessors (CMPs) can be problematic when multiple cores compete for a shared on-chip cache (L2 or L3). In this paper, we (i) quantify the impact of conventional data prefetching(More)
—As data sizes continue to increase, the concept of active storage is well fitted for many data analysis kernels. Nevertheless, while this concept has been investigated and deployed in a number of forms, enabling it from the parallel I/O software stack has been largely unexplored. In this paper, we propose and evaluate an active storage system that allows(More)
Disk power consumption is becoming an increasingly important issue in high-end servers that execute large-scale data-intensive applications. In particular, array-based scientific codes can spend a significant portion of their power budget on the disk subsystem. Observing this, the prior research proposed several strategies, such as spinning down to(More)
Power consumption of disk based storage systems is becoming an increasingly pressing issue for both commercial and scientific application domains. Prior work proposed several hardware based approaches to reducing disk power consumption by making use of techniques such as spinning down idle disks and rotating them at lower speeds than the maximum speed(More)
Software related issues such as code/data mapping, highlevel communication management and power control are becoming increasingly important as we move towards NoC(network-on-chip) based CMPs (chip multiprocessors). This paper presents an ILP (integer linear programming) based formulation of the problem of communication energy minimization in such(More)
Data checkpointing is an important fault tolerance technique in High Performance Computing (HPC) systems. As the HPC systems move towards exascale, the storage space and time costs of checkpointing threaten to overwhelm not only the simulation but also the post-simulation data analysis. One common practice to address this problem is to apply compression(More)
The problem attacked in this paper is one of automatically mapping an application onto a Network-on-Chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fashion. The proposed compiler approach has four major steps: task scheduling, processor mapping, data mapping, and packet routing. In the first step, the application code is(More)
Excessive power consumption is becoming a major barrier to extracting the maximum performance from high-performance parallel systems. Therefore, techniques oriented towards reducing power consumption of such systems are expected to become increasingly important in the future. Since disk systems of high-performance architectures are known to constitute a(More)