Learn More
The gem5 simulation infrastructure is the merger of the best aspects of the M5 [4] and GEMS [9] simulators. M5 provides a highly configurable simulation framework, multiple ISAs, and diverse CPU models. GEMS complements these features with a detailed and exible memory system, including support for multiple cache coherence protocols and interconnect models.(More)
Most modern cores perform a highly-associative transaction look aside buffer (TLB) lookup on every memory access. These designs often hide the TLB lookup latency by overlapping it with L1 cache access, but this overlap does not hide the power dissipated by TLB lookups. It can even exacerbate the power dissipation by requiring higher associativity L1 cache.(More)
Many future heterogeneous systems will integrate CPUs and GPUs physically on a single chip and logically connect them via shared memory to avoid explicit data copying. Making this shared memory coherent facilitates programming and fine-grained sharing, but throughput-oriented GPUs can overwhelm CPUs with coherence requests not well-filtered by caches.(More)
The overheads of memory management units (MMUs) have gained importance in today's systems. Detailed simulators may be too slow to gain insights into micro-architectural techniques that improve MMU efficiency. To address this issue, we propose a novel tool, BadgerTrap, which allows online instrumentation of TLB misses. It allows first-order analysis of new(More)
Our analysis shows that many "big-memory" server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory. They consume as much as 10% of execution cycles on TLB misses, even using large pages. On the other hand, we find that these workloads use read-write permission on most pages, are provisioned(More)
Addresses suffering from cache misses typically exhibit repetitive patterns due to the temporal locality inherent in the access stream. However, we observe that the number of in- tervening misses at the last-level cache between the eviction of a particular block and its reuse can be very large, pre- venting traditional victim caching mechanisms from(More)
Virtualization provides value for many workloads, but its cost rises for workloads with poor memory access locality. This overhead comes from translation look aside buffer (TLB) misses where the hardware performs a 2D page walk (up to 24 memory references on x86-64) rather than a native TLB miss (up to only 4 memory references). The first dimension(More)
Traditional exhibitions in museums provide limited possibilities of interaction between the visitor and their artefacts. Usually, interaction is confined to reading labels with little information on the exhibits, shop booklets, and audio guided tours. These forms of interaction provide minimal information, and do not respond to a visitor's personalized(More)
This paper presents a low cost architecture that extensively uses XML technologies to present generic digital content using the web and augmented reality. We describe our client-server architecture with particular emphasis on unique XML schemas that are used to design a generic XML repository and illustrate its use with two different application scenarios.(More)