Irina Calciu

Learn More
Non-Uniform Memory Access (NUMA) architectures are gaining importance in mainstream computing systems due to the rapid growth of multi-core multi-chip machines. Extracting the best possible performance from these new machines will require us to revisit the design of the concurrent algorithms and synchronization primitives which form the building blocks of(More)
Intel’s Haswell and IBM’s Blue Gene/Q and System Z are the first commercially available systems to include hardware transactional memory (HTM). However, they are all best-effort, meaning that every hardware transaction must have an alternative software fallback path that guarantees forward progress. The simplest and most widely used software fallback is a(More)
The Intel Haswell processor includes restricted transactional memory (RTM), which is the first commodity-based hardware transactional memory (HTM) to become publicly available. However, like other real HTMs, such as IBM's Blue Gene/Q, Haswell's RTM is best-effort, meaning it provides no transactional forward progress guarantees. Because of this, a software(More)
Even for small multi-core systems, it has become harder and harder to support a simple shared memory abstraction: processors access some memory regions more quickly than others, a phenomenon called non-uniform memory access (NUMA). These trends have prompted researchers to investigate alternative programming abstractions based on message passing rather than(More)
Emerging cache-coherent non-uniform memory access (ccNUMA) architectures provide cache coherence across hundreds of cores. These architectures change how applications perform: while local memory accesses can be fast, remote memory accesses suffer from high access times and increased interconnect contention. Because of these costs, performance of legacy code(More)
Priority queues are fundamental abstract data structures, often used to manage limited resources in parallel programming. Several proposed parallel priority queue implementations are based on skiplists, harnessing the potential for parallelism of the add() operations. In addition, methods such as Flat Combining have been proposed to reduce contention by(More)
Programmers can utilize the upcoming non-volatile memory (NVM) technology in various ways. One appealing way is to directly store critical application data structures in NVM instead of serializing them to block-storage. Changing legacy code to achieve this, however, is laborious and prone to bugs. We present NVMOVE, a tool that simplifies this transition by(More)
High-performance servers are Non-Uniform Memory Access (NUMA) machines. To fully leverage these machines, programmers need efficient concurrent data structures that are aware of the NUMA performance artifacts. We propose Node Replication (NR), a <i>black-box</i> approach to obtaining such data structures. NR takes an arbitrary sequential data structure and(More)
The performance gap between memory and CPU has grown exponentially. To bridge this gap, hardware architects have proposed near-memory computing (also called processing-in-memory, or PIM), where a lightweight processor (called a PIM core) is located close to memory. Due to its proximity to memory, a memory access from a PIM core is much faster than that from(More)
of “ Concurrent Algorithms for Emerging Hardware Platforms ” by Irina Calciu, Ph.D., Brown University, May 2015 Computer architecture has recently seen an explosion of innovation that has enabled more parallel execution, while parallel software systems have been making strides in providing more simplified programming models. The number of computing cores(More)