Seth H. Pugsley

Learn More
While Processing-in-Memory has been investigated for decades, it has not been embraced commercially. A number of emerging technologies have renewed interest in this topic. In particular, the emergence of 3D stacking and the imminent release of Micron's Hybrid Memory Cube device have made it more practical to move computation near memory. However, the(More)
USIMM, the Utah SImulated Memory Module, is a DRAM main memory system simulator that is being released for use in the Memory Scheduling Championship (MSC), organized in conjunction with ISCA-39. MSC is part of the JILP Workshops on Computer Architecture Competitions (JWAC). This report describes the simulation infrastructure and how it will be used within(More)
Snooping and directory-based coherence protocols have become the de facto standard in chip multi-processors, but neither design is without drawbacks. Snooping protocols are not scalable, while directory protocols incur directory storage overhead, frequent indirections, and are more prone to design bugs. In this paper, we propose a novel coherence protocol(More)
In a hardware transactional memory system with lazy versioning and lazy conflict detection, the process of transaction commit can emerge as a bottleneck. This is especially true for a large-scale distributed memory system where multiple transactions may attempt to commit simultaneously and coordination is required before allowing commits to proceed in(More)
Memory latency is a major factor in limiting CPU performance, and prefetching is a well-known method for hiding memory latency. Overly aggressive prefetching can waste scarce resources such as memory bandwidth and cache capacity, limiting or even hurting performance. It is therefore important to employ prefetching mechanisms that use these resources(More)
......Warehouse-scale computing is dominated by systems using commodity hardware to execute workloads, processing large amounts of data distributed among many disks. Several frameworks, such as MapReduce, have emerged in recent years to facilitate the management and development of big-data workloads. These systems rely on disks for most data accesses, but(More)
Prior work in hardware prefetching has focused mostly on either predicting regular streams with uniform strides, or predicting irregular access patterns at the cost of large hardware structures. This paper introduces the Variable Length Delta Prefetcher (VLDP), which builds up delta histories between successive cache line misses within physical pages, and(More)
Multiple virtual machines (VMs) are typically co-scheduled on cloud servers. Each VM experiences different latencies when accessing shared resources, based on contention from other VMs. This introduces timing channels between VMs that can be exploited to launch attacks by an untrusted VM. This paper focuses on trying to eliminate the timing channel in the(More)
Designing prefetchers to maximize system performance often requires a delicate balance between coverage and accuracy. Achieving both high coverage and accuracy is particularly challenging in workloads with complex address patterns, which may require large amounts of history to accurately predict future addresses. This paper describes the Signature Path(More)
Future large-scale multi-cores will likely be best suited for use within high-performance computing (HPC) domains. A large fraction of HPC workloads employ the messagepassing interface (MPI), yet multi-cores continue to be optimized for shared-memory workloads. In this position paper, we put forth the design of a unique chip that is optimized for MPI(More)