José F. Martínez

Learn More
This paper presents <i>core fusion,</i> a reconfigurable chip multiprocessor(CMP) architecture where groups of fundamentally independent cores can dynamically morph into a larger CPU, or they can be used as distinct processing elements, as needed at run time by applications. Core fusion gracefully accommodates software diversity and incremental(More)
Although silicon optical technology is still in its formative stages, and the more near-term application is chip-to-chip communication, rapid advances have been made in the development of on-chip optical interconnects. In this paper, we investigate the integration of CMOS-compatible optical technology to on-chip cache-coherent buses in future CMPs. While(More)
Previous proposals for power-aware thread-level parallelism on chip multiprocessors (CMPs) mostly focus on multiprogrammed workloads. Nonetheless, parallel computation of a single application is critical in light of the expanding performance demands of important future workloads. This work addresses the problem of dynamically optimizing power consumption of(More)
Efficient sharing of system resources is critical to obtaining high utilization and enforcing system-level performance objectives on chip multiprocessors (CMPs). Although several proposals that address the management of a single microarchitectural resource have been published in the literature, coordinated management of multiple interacting resources on(More)
Aggressive CMOS scaling will make future chip multiprocessors (CMPs) increasingly susceptible to transient faults, hard errors, manufacturing defects, and process variations. Existing fault-tolerant CMP proposals that implement dual modular redundancy (DMR) do so by statically binding pairs of adjacent cores via dedicated communication channels and buffers.(More)
Long-latency loads are critical in today's processors due to the ever-increasing speed gap with memory. Not only do these loads block the execution of dependent instructions, they also prevent other instructions from moving through the in-order reorder buffer (ROB) and retire. As a result, the processor quickly fills up with uncommitted instructions, and(More)
Speculative parallelization aggressively executes in parallel codes that cannot be fully parallelized by the compiler. Past proposals of hardware schemes have mostly focused on single-chip multiprocessors (CMPs), whose effectiveness is necessarily limited by their small size. Very few schemes have attempted this technique in the context of scalable(More)
This paper presents <i>CHeckpointed Early Resource RecYcling (Cherry)</i>, a hybrid mode of execution based on ROB and checkpointing that decouples resource recycling and instruction retirement. Resources are recycled early, resulting in a more efficient utilization. Cherry relies on state checkpointing and rollback to service exceptions for instructions(More)
Barriers, locks, and flags are synchronizing operations widely used programmers and parallelizing compilers to produce race-free parallel programs. Often times, these operations are placed suboptimally, either because of conservative assumptions about the program, or merely for code simplicity.We propose <i>Speculative Synchronization,</i> which applies the(More)
Efficiently utilizing off-chip DRAM bandwidth is a critical issuein designing cost-effective, high-performance chip multiprocessors(CMPs). Conventional memory controllers deliver relativelylow performance in part because they often employ fixed,rigid access scheduling policies designed for average-case applicationbehavior. As a result, they cannot learn and(More)