Learn More
Accurate simulation is essential for the proper design and evaluation of any computing platform. Upon the current move toward the CPU-GPU heterogeneous computing era, researchers need a simulation framework that can model both kinds of computing devices and their interaction. In this paper, we present Multi2Sim, an open-source, modular, and fully(More)
Current microprocessors are based in complex designs, integrating different components on a single chip, such as hardware threads, processor cores, memory hierarchy or interconnection networks. The permanent need of evaluating new designs on each of these components motivates the development of tools which simulate the system working as a whole. In this(More)
Silicon-photonic link technology promises to satisfy the growing need for high bandwidth, low-latency and energy-efficient network-on-chip (NoC) architectures. While silicon-photonic NoC designs have been extensively studied for future many-core systems, their use in massively-threaded GPUs has received little attention to date. In this paper, we first(More)
As we move into a new era of heterogeneous multi-core systems, our ability to tune the performance and understand the reliability of both hardware and software becomes more challenging. Given the multiplicity of different design trade-offs in hardware and software, and the rate of introduction of new architectures and hardware/software features, it becomes(More)
The validation buffer (VB) Microarchitecture retires instructions out of order, by substituting the classical ROB by the VB structure. The VB removes the negative effect of long latency instructions located at the ROB head, which prevent other instructions from retiring and cause frequent pipeline stalls due to lack of space in the ROB. This work analyzes(More)
Nowadays, embedded systems can be found in a wide range of pervasive devices (e.g., smart phones, PDAs, or video/digital cameras). These devices contain large cache memories, whose power consumption can reach about 50% of the total spent energy, from which leakage energy is the predominant fraction in current technologies. This paper proposes a technique to(More)
Current superscalar processors commit instructions in program order by using a reorder buffer (ROB). The ROB provides support for speculation, precise exceptions, and register reclamation. However, committing instructions in program order may lead to significant performance degradation if a long latency operation blocks the ROB head. Several proposals have(More)
In this article, we describe how to ease memory management between a Central Processing Unit (CPU) and one or multiple discrete Graphic Processing Units (GPUs) by architecting a novel hardware-based Unified Memory Hierarchy (UMH). Adopting UMH, a GPU accesses the CPU memory only if it does not find its required data in the directories associated with its(More)
Graphics Processing Units (GPUs) are popular hardware accelerators for data-parallel applications, enabling the execution of thousands of threads in a Single Instruction Multiple Thread (SIMT) fashion. However, the SIMT execution model is not efficient when code includes critical sections to protect the access to data shared by the running threads. In(More)
Modern superscalar processors implement register renaming by using either RAM or CAM tables. The design of these structures should address their access time and misprediction recovery penalty. While direct-mapped RAMs provide faster access times, CAMs are more appropriate to avoid recovery penalties. Although they are more complex and slower, CAMs usually(More)