Latchesar Ionkov

Learn More
As compute nodes increase in parallelism, existing intra-node locking and synchronization primitives need to be scalable, fast, and power efficient. Most parallel runtime systems try to find a balance between these properties during synchronization by fine-tuned spin-waiting and processor yielding to the OS. Unfortunately, the code path followed by the OS(More)
The advent of many-core processors is imposing many changes on the operating system. The resources that are under contention have changed; previously, CPU cycles were the resource in demand and required fair and precise sharing. Now compute cycles are plentiful, but the memory per core is decreasing. In the past, scientific applications used all the CPU(More)
Long-running HPC applications guard against node failures by writing checkpoints to parallel file systems. Writing these checkpoints with petascale class machines has proven difficult and the increased concurrency demands of exascale computing will exacerbate this problem. To meet checkpointing demands and sustain application-perceived throughput at(More)
In this work we address the growing need for mechanisms for intranode application composition. We provide a novel shared memory interface that allows composite applications, two or more coupled applications, to share internal data structures without blocking. This allows independent progress of the applications such that they can proceed in a parallel,(More)
Until recently most scientific applications produced data that is saved, analyzed and visualized at later time. In recent years, with the large increase in the amount of data and computational power available there is demand for applications to support data access in-situ, or close-to simulation to provide application steering, analytics and visualization.(More)
The execution of a SPMD application involves running multiple instances of a process with possibly varying arguments. With the widespread adoption of massively multicore processors, there has been a focus towards harnessing the abundant compute resources effectively in a power-efficient manner. Although much work has been done towards optimizing distributed(More)
In this paper we present a new programming model for the Cell BE architecture called CellFS. CellFS aims to simplify the task of managing I/O between the local store of the synergistic processing units and main memory of the Cell. The CellFS support library provides the means for transferring data via simple file I/O operations therefore eliminating the(More)
The data volume of many scientific applications has substantially increased in the past decade and continues to increase due to the rising needs of high-resolution and finegranularity scientific discovery. The data movement between storage and compute nodes has become a critical performance factor and has attracted intense research and development attention(More)